MLP for IDS Transfer Learning

This code was created for a project for CMPE-789 Machine Learning for Cybersecurity Analytics. The code allows for testing of transfer learning on an intrusion detection dataset. For more details see the included presentation with an overview and results.

Presentation

Procedure

Download Dataset

Download the CIC-IDS-2017 and/or the CIC-IDS-2018 dataset to use this code.

CIC-IDS-2018: link

CIC-IDS-2017: link

The code could be adapted for other datasets, but this requires further work.

Build Docker Container (Optional)

You may use the provided Dockerfile to build a container with all of the necessary requirements required to run the provided code. However, you must have some version of CUDA, Docker and the NVIDIA container toolkit installed (see link).

Otherwise, feel free to set up the environment in whatever way you want.

Copy Dockerfile template
```
$ cp Dockerfile Dockerfile.new
```
Update lines 4 and 18 with desired author info and username
Update .dockerignore with any added directories as necessary

Build the Docker container

$ docker build -t <desired-tag> -f Dockerfile.new .

Run Docker Container. <tag> must be the same tag used in step 4.

docker run -it --gpus all --shm-size=25G -e HOME=$HOME -e USER=$USER -v $HOME:$HOME -w $HOME --user <created-user> <tag>

Navigate to code (Home directories will be linked) and run

Run Random Forest Classifier

The random forest classifier is used as a baseline for the MLP model. The pkl-path argument can point to any empty directory where the preprocessed dataset will be saved to reduce preprocessing time on subsequent runs.

python3 classify_rf.py \
--max-depth=10 \
--data-path=/home/poppfd/data/CIC-IDS2018/Processed_Traffic_Data_for_ML_Algorithms/ \
--dset='cic-2018' \
--pkl-path=/home/poppfd/College/ML_Cyber/ml-project/data

Run MLP Classifier

There are two files for the MLP classifier. A training script mlp.py and a testing script eval_mlp.py. Provided here are the sample commands to run these scripts. These commands will have to be updated to match your environment.

Train on 2018 data

python3 mlp.py \
--dset=cic-2018 \
--data-root=/home/poppfd/data/CIC-IDS2018/Processed_Traffic_Data_for_ML_Algorithms/ \
--pkl-path=/home/poppfd/College/ML_Cyber/ml-project/data \
--batch-size=32 \
--eval-batch-size=1028 \
--num-epochs=10 \
--warmup-epochs=2 \
--learning-rate=1e-4 \
--min-lr=1e-6 \
--warmup-lr=1e-5 \
--name=train-2018-1

Train 2018 pretrained on 2017 data

Note that the script is identical to freeze the feature extraction layer. Update --transfer-learn=freeze-feature

python3 mlp.py \
--dset=cic-2017 \
--data-root=/home/poppfd/data/CIC-IDS2017/MachineLearningCVE \
--pkl-path=/home/poppfd/College/ML_Cyber/ml-project/data \
--batch-size=32 \
--eval-batch-size=1028 \
--num-epochs=10 \
--warmup-epochs=2 \
--learning-rate=1e-4 \
--min-lr=1e-6 \
--warmup-lr=1e-5 \
--transfer-learn=fine-tune \
--source-classes=10 \
--pretrained-path=/home/poppfd/College/ML_Cyber/ml-project/output/mlp-5-layer-3/model_eval_5.pth \
--name=transfer-2017-fine-tune

Evaluate MLP

python3 eval_mlp.py \
--dset='cic-2018' \
--dataset-dir=/home/poppfd/data/CIC-IDS2018/Processed_Traffic_Data_for_ML_Algorithms/ \
--pkl-path=/home/poppfd/College/ML_Cyber/ml-project/data \
--model-path=/home/poppfd/College/ML_Cyber/ml-project/output/transfer-2018-freeze-1/model_eval_23.pth \
--batch-size=1028 \
--name=transfer-2018-freeze-1

Generate TSNE Plots

The eval_mlp.py script can also be used to generate t-SNE visualizations of the MLP feature embedding. For this case since the t-SNE is really slow for a high number of samples, only a small subset of the evaluation dataset is used for the t-SNE plots. Therefore, all other output of this run should be ignored as it does not include the full evaluation dataset.

python3 eval_mlp.py \
--dset='cic-2018' \
--dataset-dir=/home/poppfd/data/CIC-IDS2018/Processed_Traffic_Data_for_ML_Algorithms/ \
--pkl-path=/home/poppfd/College/ML_Cyber/ml-project/data \
--model-path=/home/poppfd/College/ML_Cyber/ml-project/output/transfer-2018-freeze-1/model_eval_23.pth \
--batch-size=1028 \
--tnse \
--tsne-percent=0.01 \
--name=transfer-2018-freeze-1

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Investigation_of_IDS_Transfer_Learning_with_MLP_Networks.pdf		Investigation_of_IDS_Transfer_Learning_with_MLP_Networks.pdf
README.md		README.md
classify_rf.py		classify_rf.py
data_preprocessing.py		data_preprocessing.py
eval_mlp.py		eval_mlp.py
load_data.py		load_data.py
mlp.py		mlp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLP for IDS Transfer Learning

Procedure

Download Dataset

Build Docker Container (Optional)

Run Random Forest Classifier

Run MLP Classifier

Train on 2018 data

Train 2018 pretrained on 2017 data

Evaluate MLP

Generate TSNE Plots

About

Releases

Packages

Languages

ddp5730/ml-project

Folders and files

Latest commit

History

Repository files navigation

MLP for IDS Transfer Learning

Procedure

Download Dataset

Build Docker Container (Optional)

Run Random Forest Classifier

Run MLP Classifier

Train on 2018 data

Train 2018 pretrained on 2017 data

Evaluate MLP

Generate TSNE Plots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages