DIQ-project-2022

Project of Data and Information Quality 2022-2023 course at Politecnico di Milano

The objective was to take two dirty datasets [Adult, Frogs] with different accuracy (50% - 90%) and evaluate the classification of the tuples with machine learing techniques before and after outlier detection. This evaluation was made with two different outlier detection techniques: standard with Z-score and advanced KNN. Then, the datasets were evaluated with RidgeClassifier and DecisionTreeClassifier to verify accuracy

The folder contains the dirty datasets and code used to perform the cleaning activities
The report explains the pipeline of the implementation and the obtained results

Final grade: 3/3

Group members

Lara Ferro | lara.ferro@mail.polimi.it
Stefano Fumagalli | stefano14.fumagalli@mail.polimi.it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DIQ-project-2022

Group members

Files

README.md

Latest commit

History

README.md

File metadata and controls

DIQ-project-2022

Group members