In some cases, we need to plot the distribution of the data contained in a DataFrame. For example: In a column of type category. In this Readme.md explains how to use the function_plot_distribution_columns. In the Workspace.ipynb a practical example of how to apply the function on a dataframe is presented.
A function designed to facilitate the analysis of the data distribution in a DataFrame column. The function returns the table with sorted single values and a graph with the data distribution.
plot_distribution_clm(data_in=None ,clm_name=None ,order_A_Z=False ,notplot=list() ,dif_colors=False ,point=10 ,porcentage=False ,pst='v'):
data_in: Series or DataFrame, default None.Dataframe that can contain Series or column of dataframe.
clm_name: String, default None.
Name of column to plot.
order_A_Z: Bool, default False.
Parameter to sort the output table by unique value names.
notplot:List, default [ ].
List with the name of the unique values that you dont want to plot.
dif_colors: Bool, default False.
Parameter for different the min and max values by colour.
point: Int, default 10
Parameter for indicate the number of column to visualize.
porcentage: Bool, default False.
Parameter to normalize the values of the output table and graph in percent units.
pst: Value of Categories, Default 'v'
Parameter for indicate if plot the graphic in vertical position (h) or horizontal position (h)