Skip to content

This repo is for finding whether a PDF file in json format contains virus or not by classifying it into Malicious or Benign using Classification algorithms.

Notifications You must be signed in to change notification settings

saral7293/Malicious-Vs-Benign-Pdf-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

About

This project was part of my Cybersecurity coursework that I did in my college in which I had to build Machine Learning algorithms to classify whether a PDF file in JSON format is Malicious or Benign i.e whether it contains virus or not.

Data

Data was given to me and it is in JSON format and is populated in the features folder in 2 categories- Malicious & Benign. It contains in total 10000 files equally divided for both categories.

Report

A report on how the data was parsed, challenges faces, EDA insights and different ML models built with their comparison with performance metrics for each.

Screenshot 2024-05-07 022619 Screenshot 2024-05-07 022635 Screenshot 2024-05-07 022706 Screenshot 2024-05-07 022716 Screenshot 2024-05-07 022726

About

This repo is for finding whether a PDF file in json format contains virus or not by classifying it into Malicious or Benign using Classification algorithms.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published