Malware research is a very dynamic field given the ever-changing security landscape in the modern day. Security against malicious software such as viruses, worms, Trojan horses, etc. requires continual improvement or even novel methods to improve the detection of such software. Several proposed mechanisms have been implemented, but often significantly lack automation capability. This has motivated researchers over the years to look into implementations leveraging areas of machine learning such as Deep learning. In this project, two convolutional neural networks were implemented to study their detection accuracy given differences in their depths and hyperparameters.
This repository contains live Windows portable executable malware samples in the password-protected archive named samples.7z, the password is "infected". I will not be held liable for any damage that may occur from mishandling the samples. You have been warned! The are 4000 samples in the archive, to extract them you can use the following command:
7z x samples.7z -pinfected
The thesis writeup for this project can be found here , the writeup begins by introducing concepts explored throughout the project and builds up to the experiments done in this repository.
Scripts written in bash and python are provided in the scripts directory to convert the malware binaries into images and split the images into training, validation and testing datasets. imauto.sh is provided for automating the conversion, and split.sh is provided for automating splitting the dataset.
A Makefile is provided should you wish to use it to run the program. Just so you know, calling makes creating a Python environment for you assuming you do not have one, and also installs the necessary packages specified in the requirements.txt file. training, testing, and clean commands are provided in the file and can be executed by:
make <command_name>
You are required to first install the necessary packages needed to run the program, this can be done with the following command:
pip install -r requirements.txt
You can then execute the following command to train or test the models:
python CNN_Malware_Train_Test.py <flag> <model_name>
Where flag must either be:
- --train (For training a model, requires a train_output directory in the root directory to save all the files generated, including the state dict of the model)
- --test (For testing a model, requires a test_output directory in the root directory to save all the files generated)
Where model_name must either be:
- Model_One
- Model_Two
Hyperparameter tuning is done on the platform weights and biases, if you wish to do this yourself. The notebook CNN_Malware_Hyperparameter_Study.ipynb is provided.