Malware-Analysis-and-Detection

Introduction

Malware research is a very dynamic field given the ever-changing security landscape in the modern day. Security against malicious software such as viruses, worms, Trojan horses, etc. requires continual improvement or even novel methods to improve the detection of such software. Several proposed mechanisms have been implemented, but often significantly lack automation capability. This has motivated researchers over the years to look into implementations leveraging areas of machine learning such as Deep learning. In this project, two convolutional neural networks were implemented to study their detection accuracy given differences in their depths and hyperparameters.

Preliminary remarks

This repository contains live Windows portable executable malware samples in the password-protected archive named samples.7z, the password is "infected". I will not be held liable for any damage that may occur from mishandling the samples. You have been warned! The are 4000 samples in the archive, to extract them you can use the following command:

7z x samples.7z -pinfected

Thesis

The thesis writeup for this project can be found here , the writeup begins by introducing concepts explored throughout the project and builds up to the experiments done in this repository.

Data processing

Scripts written in bash and python are provided in the scripts directory to convert the malware binaries into images and split the images into training, validation and testing datasets. imauto.sh is provided for automating the conversion, and split.sh is provided for automating splitting the dataset.

Running the program

Using make

A Makefile is provided should you wish to use it to run the program. Just so you know, calling makes creating a Python environment for you assuming you do not have one, and also installs the necessary packages specified in the requirements.txt file. training, testing, and clean commands are provided in the file and can be executed by:

make <command_name>

Using command line

You are required to first install the necessary packages needed to run the program, this can be done with the following command:

pip install -r requirements.txt

You can then execute the following command to train or test the models:

python CNN_Malware_Train_Test.py <flag> <model_name>

Where flag must either be:

--train (For training a model, requires a train_output directory in the root directory to save all the files generated, including the state dict of the model)
--test (For testing a model, requires a test_output directory in the root directory to save all the files generated)

Where model_name must either be:

Model_One
Model_Two

Hyperparameter Tuning

Hyperparameter tuning is done on the platform weights and biases, if you wish to do this yourself. The notebook CNN_Malware_Hyperparameter_Study.ipynb is provided.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Images		Images
__pycache__		__pycache__
output		output
scripts		scripts
test_output		test_output
train_output		train_output
.gitattributes		.gitattributes
.gitignore		.gitignore
CNN_Malware_Hyperparameter_Study.ipynb		CNN_Malware_Hyperparameter_Study.ipynb
CNN_Malware_Train_Test.py		CNN_Malware_Train_Test.py
CNN_Models.py		CNN_Models.py
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
data_loaders.py		data_loaders.py
debug.log		debug.log
metrics.py		metrics.py
requirements.txt		requirements.txt
samples.7z		samples.7z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware-Analysis-and-Detection

Introduction

Preliminary remarks

Thesis

Data processing

Running the program

Using make

Using command line

Hyperparameter Tuning

About

Releases

Packages

Contributors 2

Languages

License

lehasaS/Malware-Analysis-and-Detection

Folders and files

Latest commit

History

Repository files navigation

Malware-Analysis-and-Detection

Introduction

Preliminary remarks

Thesis

Data processing

Running the program

Using make

Using command line

Hyperparameter Tuning

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages