Automate the process of visualization.
Visualizer is a Python3 package that facilitates the process of visualization whether by automation which saves a huge amount of time,
or by specifiying which relationship to visualize or which type of plot you want to show.
pip install -U visualizer
Visualizer package allows you to do 2 types of plotting:
NOTE:
The first type of plotting starts with create_, the second type of plotting starts with visualize_ as you will see next.
# Import the essential libraries.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns', None)
sns.set_style('white')
# Reading the data set to visualize.
df = pd.read_csv("/media/mosaab/Volume/Personal/Development/Courses Docs/ML GrandMaster/ml_project/input/new_df.csv")
# print the shape of the data.
df.shape
df.head()
We can see that the dataframe has:
Let's import Visualizer package:
# Import the visualizer.
from visualizer import Visualizer
# instantiate the class.
vis = Visualizer(df=df,
# num_cols=[],
# cat_cols=[],
target_col='SalePrice',
ignore_cols=['ord_5'],
problem_type='regression')
# Parameters:
## 1. df: (pandas.dataframe) dataframe (required)
## 2. target_col: (str) target column. (required)
## 3. problem_type: (str) type of problem type whether "classification" or "regression". (required)
## 3. num_cols: (list) numerical columns. (optional)
## 4. cat_cols: (list) categorica columns. (optional)
## 5. ignore_cols: (list) columns to ignore their visualizations. (optional)
# Visualize all the relationships of the features.
vis.visualize_all()
# If you want NOT to include a specific plotting, you can do the following:
# vis.visualize_all(use_kde_plot=False)
# This will not create any KDE plots, and you can that
# for every type of plottings with the same way.
After the process is done, your current path will be like this:
This type of plotting plot each column individually whether it's categorical and numerical with different types of plots.
NOTE: All the plottings will be saved in the visualizer folder.
# Visualize the target column helps to know if we have imbalanced problem or not.
# The following parameters are True by default.
vis.visualize_target(use_count_plot=True,
use_pie_plot=True,
use_kde_plot=True,
use_hist_plot=True)
# Visualizing the categorical columns helps us how to encode them properly.
vis.visualize_cat()
# Visalizing the numerical columns helps to know if we need to transform some columns or not.
vis.visualize_num()
This type of plotting plot the relationship between 2 columns.
NOTE: All the plottings will be saved in the visualizer folder.
# visualize the relationships between categorical columns and numerical columns.
vis.visualize_cat_with_cat()
vis.visualize_num_with_cat()
vis.visualize_num_with_num()
# visualize the relationships between the categorical or numerical columns with the index.
# this is very helpful when you have a time-series data.
vis.visualize_cat_with_idx()
vis.visualize_num_with_idx()
# visualize the relationships between the target and the rest of the columns.
# This is very helpful to see the correlation between the target and the other columns,
# so you can build an intuition which columns are important to be kept and which ones are redundant.
vis.visualize_cat_with_target()
vis.visualize_num_with_target()
This type of plotting plots the relationships between more than 2 columns.
# This type of plotting is very helpful to find a pattern in your dataset.
vis.visualize_multi_variate()
This functionality allows you to show any plotting available in the visualizer package in the notebook.
All the methods for this functionality starts with create_, and you can use call it without instantiating the class!
Let's see:
# Uni-variate plotting for categorical columns.
# 1. Count plot
Visualizer.create_count_plot(df=df, # ---> dataframe (required)
cat_col='bin_3', # ---> name of categorical column in the mentioned dataframe. (required)
figsize=(6, 4), # ---> figure size (optional)
annot=True, # ---> show annotation of the percentage [True, False] (optional)
rotate=False) # ---> Rotate the x labels [True, False] (optional)
# 2. Pie plot.
Visualizer.create_pie_plot(df=df, # ---> dataframe (required)
cat_col='nom_1', # ---> name of categorical column in the mentioned dataframe. (required)
figsize=(6, 6)) # ---> figure size (optional)
# Uni-variate for numerical columns.
# 3. Historgram Plot.
Visualizer.create_hist_plot(df=df, # ---> dataframe (required)
num_col='SalePrice', # ---> name of numerical column in the mentioned dataframe. (required)
figsize=(8, 6)) # ---> figure size (optional)
# 4. KDE plot.
Visualizer.create_kde_plot(df=df,
num_col='GarageArea',
target_col=None,
figsize=(8, 6))
# 5. WordCloud for categorical column with a high cardinality labels.
Visualizer.create_wordcloud(df=df,
cat_col='ord_5',
figsize=(10, 10))
# 6. Histogram for categorical feature with a high cardinality labels.
Visualizer.create_hist_for_high_cardinality(df=df,
cat_col='ord_5',
annot=True)
# This plot is very helpful answering the question:
# How many labels occured n times?
# For example, we can say by looking at the most right side,
# that there are 2 labels that occured 30 times in the whole dataset for that column.
# 7. Line plot along the index.
# This plot is very helpful when we have time-series dataset.
Visualizer.create_line_with_index(df=df,
num_col='SalePrice',
target_col=None,
figsize=(25, 6))
# 8. Point line along the index.
Visualizer.create_point_with_index(df=df,
num_col='ord_5',
target_col='target',
figsize=(25, 6))
# 9. Clustered Bar Chart.
# This plot is very helpful to see how 2 categorical features are correlated together.
Visualizer.create_clustered_bar_plot(df=df,
cat_1='nom_0',
cat_2='bin_3',
figsize=(14, 8))
# 10. Bubble Chart.
# Used between 2 categorical features.
Visualizer.create_bubble_plot(df=df,
cat_1='bin_3',
cat_2='nom_0',
figsize=(12, 6))
# 11. Scatter Plot.
# used between 2 numerical columns.
Visualizer.create_scatter_plot(df=df,
num_1='SalePrice',
num_2='GarageArea',
target_col='target',
figsize=(12, 6))
# 12. Density Plot.
# used between 2 numerical columns.
Visualizer.create_density_plot(df=df,
num_1='SalePrice',
num_2='GarageArea',
figsize=(8, 6))
# 13. Box Plot.
# used between categorical and numerical column.
Visualizer.create_box_plot(df=df,
num_col='SalePrice',
cat_col='bin_3',
figsize=(8, 6))
# 14. Violin Plot.
# Used between 1 categorical and 1 numerical columns.
Visualizer.create_violin_plot(df=df,
num_col='SalePrice',
cat_col='nom_0')
# 15. Ridge Plot.
# Used between categorical column and numerical column.
Visualizer.create_ridge_plot(df=df,
num_col='SalePrice',
cat_col='nom_0')
# 16. Parallel Plot.
# Used between multiple columns more than 2.
num_cols = ['GarageArea', 'GrLivArea', '1stFlrSF', 'SalePrice']
Visualizer.create_parallel_plot(df=df,
num_cols=num_cols,
target_col='target',
figsize=(20, 10))
# 17. Radar Plot.
# Used between multipl numerical columns and one categorical column.
Visualizer.create_radar_plot(df=df,
num_cols=num_cols,
cat_col='target')