GitHub - adodangeh/Malware_Detection_ML: A Challenge for Network Traffic Analytics.machine learning methods including random-forest, SVM and MLP

NetML: A Challenge for Network Traffic Analytics

Classifying network traffic is the basis for important network applications. Prior research in this area has faced challenges on the availability of representative datasets, and many of the results cannot be readily reproduced. Such a problem is exacerbated by emerging data-driven machine learning based approaches. To address this issue, we provide three open datasets containing almost 1.3M labeled flows in total, with flow features and anonymized raw packets, for the research community. We focus on broad aspects in network traffic analysis, including both malware detection and application classification. We release the datasets in the form of an open challenge called NetML and implement several machine learning methods including random-forest, SVM and MLP. As we continue to grow NetML, we expect the datasets to serve as a common platform for AI driven, reproducible research on network flow analytics.

CICIDS2017

Raw traffic is obtained from https://www.unb.ca/cic/datasets/ids-2017.html. Attack flows are extracted by filtering each workday PCAP files with respect to time interval and IPs described in their webpage. The extracted dataset has 7 types of malware attacks and normal traffic flows.

The total number of flows for different splits:

test-challenge set: 55,128
test-std set : 55,128
traininig set: 441,116

non-vpn2016

PCAP files are downloaded from https://www.unb.ca/cic/datasets/vpn.html. The original dataset has both vpn and non-vpn packet capture files but we only focus on non-vpn captures. In top-level annotation, we categorize the traffic into 7 groups: audio, chat, email, file_transfer, tor, video, P2P. In mid-level annotation, we group into 18 classes according to the application type such as aim_chat, facebook, hangouts, skype, youtube etc. In fine-level annotation, we treat each action as a different category and obtain 31 classes such as facebook_chat, facebook_video, skype_chat, skype_video etc.

The total number of flows for different splits:

test-challenge set: 16,323
test-std set : 16,323
traininig set: 131,065

dataset: https://drive.google.com/drive/folders/1n3z8oCvTrW0jmbv2NM3cBl4QEy_m9UjD?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
RF_baseline_notebook.ipynb		RF_baseline_notebook.ipynb
labels.txt		labels.txt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

adodangeh/Malware_Detection_ML

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages