Skip to content

When signaficant amount of data are missing, what can we do? Impute the missing data with mean or median? Actually, Scikit-Learn provides two powerful imputers, KNNImputer and IterativeImputer, which can do this work effectively.

Notifications You must be signed in to change notification settings

hanfei1986/Impute-missing-data-with-KNNImputer-and-IterativeImputer

Repository files navigation

Impute-missing-data-with-KNNImputer-and-IterativeImputer

When signaficant amount of data are missing, what can we do? Impute the missing data with mean or median? That will be a diaster. Actually, Scikit-Learn provides two powerful imputers, KNNImputer and IterativeImputer. The former imputes missing data using the mean value from n_neighbors nearest neighbors found in the training set, and the latter is inspired by R's MICE package and imputes missing values by modeling each feature with missing values as a function of other features in a round-robin fashion.

Before imputation, there are significant amount of "Cost", a few "Weight", and many "Ingredient Number" data missing in the dataset.

image

After imputation, all the columns are filled.

image

Let's have a look at the imputation effect. Amazing!

image

About

When signaficant amount of data are missing, what can we do? Impute the missing data with mean or median? Actually, Scikit-Learn provides two powerful imputers, KNNImputer and IterativeImputer, which can do this work effectively.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published