Skip to content

Latest commit

 

History

History
13 lines (12 loc) · 727 Bytes

README.md

File metadata and controls

13 lines (12 loc) · 727 Bytes

EDA for Multi-Class Prediction of Cirrhosis Outcomes (kaggle dataset)

Gain Domain knowledge

Check for missing values

Check for duplicates

Categorical features distribution

Association between categorical features (Chi-square test)

Numerical features distribution (histograms, boxplots, violinplot)

Correlation between Numerical features

Transformation of numerical features and Normality tests (Log Normal, QuantileTransformer, Boxcox transformation, Kolmogorov-Smirnov test, qqplots)

Encoding values ( ordinal_encoder, label_encoder, one_hot_encoding)

Correlation between all features

PCA (Explained Variance and Cumulative Variance, loadings)