Project of Data and Information Quality 2022-2023 course at Politecnico di Milano
The objective was to take two dirty datasets [Adult, Frogs] with different accuracy (50% - 90%) and evaluate the classification of the tuples with machine learing techniques before and after outlier detection.
This evaluation was made with two different outlier detection techniques: standard with Z-score and advanced KNN. Then, the datasets were evaluated with RidgeClassifier
and DecisionTreeClassifier to verify accuracy
- The folder contains the dirty datasets and code used to perform the cleaning activities
- The report explains the pipeline of the implementation and the obtained results
Final grade: 3/3