A workflow to infect a PyTorch digit recognition CNN with a backdoor. Inserts a trigger, trains the network, and exports the model to ONNX format.
Steps:
- MNIST dataset is downloaded from PyTorch repo
- A model is trained or a pretrained one used
- A certain percentage of the training data is infected with a trigger and has its label changed
- Upon using the infecting model, clean inputs yield expected inference - but with trigger yields bad predictions
- Create a virtual environment and install dependencies.
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- Training a clean model and saving it
python mnist.py --save-model
- Infecting a model with a backdoor
python mnist.py --save-model --infection-rate=0.3
- Converting the model to ONNX to be used in the demo
python export_onnx.js ./mnist_cnn.pt
If you want to export some test data, use:
python export_dataset_imgs.py
Which will save image file samples to the ./data/
folder.
It is also possible to run the entire project on Peregrine. For this, upload the /backdoor
folder to Peregrine (e.g. through git), and in this folder run:
sbatch train-peregrine.txt
Which will launch a job to train the model on Peregrine using the GPU nodes.
Inspired by: