Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 962 Bytes

README.md

File metadata and controls

15 lines (11 loc) · 962 Bytes

DIQ-project-2022

Project of Data and Information Quality 2022-2023 course at Politecnico di Milano

The objective was to take two dirty datasets [Adult, Frogs] with different accuracy (50% - 90%) and evaluate the classification of the tuples with machine learing techniques before and after outlier detection. This evaluation was made with two different outlier detection techniques: standard with Z-score and advanced KNN. Then, the datasets were evaluated with RidgeClassifier and DecisionTreeClassifier to verify accuracy

  • The folder contains the dirty datasets and code used to perform the cleaning activities
  • The report explains the pipeline of the implementation and the obtained results

Final grade: 3/3

Group members