Skip to content

Malware classification project using the Berkeley detection_platform dataset

Notifications You must be signed in to change notification settings

nirosen/Malware-classification-of-the-Berkeley-detection-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

berkeley detection platform - data processing

steps to reproduce the process on the Berkeley dataset

1. Download the data

Download the 3% (33,000) samples from the "Malicious Content Detection Platform Project Homepage" at the following link: http://secml.cs.berkeley.edu/detection_platform/release_tarball.tar.gz

Please contact me for download access for the 1M samples reports: Nir.Rosen@post.idc.ac.il

2. Extract the cuckoo reports

Extract the cuckoo reports from the tarball file (the files named behaviour_0000 and begin with the word "info").

3. Process the cuckoo reports

Each file contains multiple cuckoo reports, and the following script extracts each report to a single report file. The script is setting a minimum syscalls-sequnce-length and the following are the filtered results.Scripts/script1_flatter_cuckoo_reports.py Data/list_of_16K_hashes_with_seq_len_limit.txt

4. Extract the VT reports

Extract the virus-total (VT) reports from the tarball file (the files named reports_0000 and begin with the word "vhash").

5. Process the VT reports

Each file contains multiple VT reports, and the following script extract some fields (sample-hash, malicious-score, sample-first-seen) from each report and store them in a single file: Scripts/script2_vt_to_json.py Data/dict_hash_to_score_time.json

6. Train-Test Split

The train-test split is done by considering both malicious-benign-ratio (for simplicity purposes it uses the ratio of 50%-malwares 50%-benign files) within the datasets and time consistency between the datasets. Scripts/script3_notebook_split_by_score_time.ipynb

About

Malware classification project using the Berkeley detection_platform dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published