Darknet is a set of networks and technologies, having as fundamentalprinciples anonymity and security. In many cases, they are associated with illicit activities, opening space for malware traffic and attacks to legitimate services.To prevent Darknet misuse is necessary to classify and characterize its existing traffic. In this paper, we characterize and classify the real Darknet traffic available from the CIC-Darknet2020 dataset. Therefore, we performed the feature extraction and grouped the possible subnets with an n-gram approach. Furthermore, we evaluated the relevance of the best features selected by the Recursive Feature Elimination method for the problem. Our results indicate that simple models, like Decision Trees and Random Forests, reach an accuracy above 99% on traffic classification, representing a gain up to 13% in comparison with the state-of-the-art.
The CIC-Darknet2020 dataset can be found at this link.
All the notebooks have the preffix Darknet, so it'll be ommited on the descriptions.
- preprocessing: contains all the preprocessing made on the dataset;
- analysis: plots and analysis of traffic data;
- detection models: training and validation of tree based models for darknet traffic detection;
- characterization models: training and validation of tree based models for darknet traffic characterization;
- feature selection: selection and analysis of the best features for darknet traffic detection and characterization.
All the notebooks can be executed locally in a conda environment by anaconda installation and the execution of the following commands inside the notebooks folder.
conda env create -f darknet.yml
conda activate darknet
jupyter notebook