This project was part of my Cybersecurity coursework that I did in my college in which I had to build Machine Learning algorithms to classify whether a PDF file in JSON format is Malicious or Benign i.e whether it contains virus or not.
Data was given to me and it is in JSON format and is populated in the features folder in 2 categories- Malicious & Benign. It contains in total 10000 files equally divided for both categories.
A report on how the data was parsed, challenges faces, EDA insights and different ML models built with their comparison with performance metrics for each.