Malware Classifier

This is the code repository for Malware Classification Research. All the deep learning models are implemented with Python 3.6+ and PyTorch 1.9.

Data

The source data is the json reports generated by malicious software dynamic analysis system Cuckoo Sandbox. The data was analyzed in order to extract the most useful information about malicious samples. As a result of the analysis, 3698 features were selected, on the basis of which further classification will be carried out. Thus, each instance of malware is assigned a binary feature vector of dimension 3698, the label of which is the result of classification by Kaspersky anti-virus. The database contains about 10,000 labeled samples from 8 different types of malware and about 14,000 unlabeled samples.

Data Visualization

The normalized vector of dimension 3698 is represented as an RGB image of the size 61 × 61 (61 ≈ √3698), in which the color of each pixel is set by the value of the corresponding feature.

Autoencoder

An autoencoder model with a latent space dimension of 200 was trained on the unlabeled data for further malware classification using pretrained encoder.

AE performance, the first row is input, the second is AE output

Also the autoencoder was trained with the size of the latent space equal to 2 for its subsequent visualization on a two-dimensional plane.

Changing the latent space in the learning process

Labeled malware samples displayed in latent space

Classifier

Сlassifier results:

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
experiments		experiments
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware Classifier

Data

Data Visualization

Autoencoder

Classifier

About

Languages

License

alex-snd/MalwareClassifier

Folders and files

Latest commit

History

Repository files navigation

Malware Classifier

Data

Data Visualization

Autoencoder

Classifier

About

Topics

Resources

License

Stars

Watchers

Forks

Languages