This folder contains the application of DeepMetis to the handwritten digit classification problem. This tool is developed in Python on top of the DEAP evolutionary computation framework. It has been tested on a machine featuring an i7 processor, 16 GB of RAM, an Nvidia GeForce 940MX GPU with 2GB of memory, Ubuntu 18.04 (bionic) OS and python 3.6.
Follow the steps below to set up DeepMetis and validate its general functionality.
This step is to configure DeepMetis on our docker container. If you want to do it on a generic Ubuntu machine use the following instructions.
NOTE: the size of the Docker image is ~15 GBs
Pull our pre-configured Docker image for DeepMetis-MNIST:
docker pull p1ndsvin/ubuntu:artifactmetis
Run it by typing in the terminal the following commands:
docker run -it --rm p1ndsvin/ubuntu:artifactmetis
Move to the DeepMetis-MNIST folder:
cd DeepMetis-MNIST
Use the following command to start a fast run of DeepMetis-MNIST:
NOTE: Before starting a run, ensure that you deleted or removed the folder named
results
in the DeepMetis-MNIST main folder
python3 main_launcher_examplerun.py
This command will perform a single run of DeepMetis for the mutant obtained by applying 'Add Weights Regularisation' operator to the MNIST model ('l1_l2' regularisation will be added to the first layer).
NOTE:
properties.py
contains the tool's configuration, i.e., you should edit this file to change its configuration. For example, if you want to run DeepMetis-MNIST for the number of iterations adopted in the paper, you need to set theNGEN
variable inproperties.py
to the value1000
.
When the run ends, on the console you should see a message like the following:
Final solution N is: X
GAME OVER
Process finished with exit code 0
where X is the number of generated mutant-killing inputs.
Moreover, DeepMetis will create a folder results_mnist_add_weights_regularisation_mutated0_MP_l1_l2_0_1
which contains:
- the archive of solutions (
archive
folder); - the final report (
report_final.json
); - the configuration's description (
config.json
).
Once DeepMetis has generated inputs for the mutant, we check whether augmentation with these inputs makes the mutant killed. First we run DeepCrime with the initial test set, i.e. without adding the generated inputs.
cd ../deepcrime
python3 evaluate_metis.py -augment=no
At the end of the run you will see the output "Mutant killed: False". We then run DeepCrime augmenting it with DeepMetis generated inputs.
python3 evaluate_metis.py
In this case, if the augmented test set kills the mutant, you will see the output "Mutant killed: True" at the end of the run .
For more information on how to set up DeepCrime, use the following GitHub repo and Zenodo artifact:
https://github.com/dlfaults/deepcrime
https://zenodo.org/record/4772465
At this step we provide scripts to extract the data reported in the paper from our overall experimental data.
All the experimental data is available in the folder experiments
. We have excluded only the .npy
files of the generated images due to their big size.
Run the following command to generate the MNIST data from Table 3 in the paper.
cd ../DeepMetis-MNIST/experiment/
python3 replicate_table3.py
The script outputs the latex code for Table 3. This information is also stored in the file
summary.csv
. In addition, it generates the file raw_data.csv
that provides information about each of 10 runs for each mutant.
Run the following command to generate the MNIST data from Table 4 in the paper.
python3 replicate_table4.py
The script outputs the latex code for Table 4.
To run DeepMetis for any mutant used in our experiments, we first need mutations generated for MNIST by the DeepCrime tool. These mutations can be downloaded from the artifacts provided by the authors of DeepCrime paper at the following links:
https://zenodo.org/record/4737748
https://zenodo.org/record/4737754
The artifacts contain h5
files that names of which correspond to one of the 20 instances of a mutation operator run with a specific parameter.
The names have the following structure:
{subject_name}_{mutation_operator}_MP_{parameter_value}_{instance_num}.h5
For example, mnist_add_noise_mutated0_MP_25.0_0.h5
corresponds to the first instance of the mutant generated by applying "Add Noise" operator to the 25%
of the training data of MNIST. As noted before, each mutant has 20 instances.
In our replication package we provide models for only one mutant (mnist_add_weights_regularisation_mutated0_MP_l1_l2_0
used at Step 2) due to the large size of h5
files.
To run DeepMetis for some other mutant, copy the h5
files of that mutant to the folder DeepMetis-MNIST/mutant_model
.
The number of instances of the mutant copied into this folder correspond to the setting of the DeepMetis that you want to use,
i.e. if you copy 5 instances of the mutant then you will run DeepMetis in 1vs5
setting. Correspondingly, if you copy 10 instances then
you will run DeepMetis in 1vs10
setting.
Once the desired number of instances have been copied, run the following command:
python3 main_launcher.py
To apply DeepMetis to the mutants that were not used in our or DeepCrime's experiments, the user first needs to generate them. The instructions on how to generate mutants using DeepCrime are provided in the tool's own replication package available at the following link:
https://zenodo.org/record/4772465
Once the h5
files of the mutant are obtained, the process of running DeepMetis is the same, i.e. we need (as per above instructions) to copy h5
into corresponding folders and run
main_launcher.py
.
We provide all the data collected during our experiments. The data in the folder DeepMetis-MNIST/experiment
in this git repository as well as in the corresponding docker contains all the data except the images generated by the
test input generators. We excluded the images due to the overall size. However, we have uploaded all the data
including also images to Zenodo at the following link:
https://zenodo.org/record/5105742
The data related to MNIST case study is located in the MNIST.zip file of the Zenodo submission. Once this file is unzipped the folder MNIST will contain the following folders and files:
-
Folder
deepmetis
which contains 4 subfoldersdeepmetis_1vs1
,deepmetis_1vs5
,deepmetis_1vs10
,deepmetis_1vs20
. The subfolders correspond to the setting with which DeepMetis was run (i.e.1vs1
,1vs5
,1vs10
and1vs20
). Each of these subfolders contain 12 folders for each of the 12 mutants used in our study. Each mutant folder contains the fileoutput.csv
and 10 folders named from 0 to 9 that correspond to each of the 10 runs. The fileoutput.csv
contains overall information about all 10 runs, indicating for each of them the number of inputs generated in the second column. For the mutation operators with range-based parameters in the third column it reports the outcome of the binary search for the augmented test set. In contrast, for the mutation operators with non range-based parameters it indicates whether the mutant becomes killed once the test set is augmented. The folder for each run contains more detailed information such as the files generated by DeepCrime for each mutant. Moreover, it contains the folderresults
that stores the output of DeepMetis. The structure of the DeepMetis' output is explained at Step 2. -
Folder
deepjanus
has same structure as the folderdeepmetis
with the only difference being the absence of setting specific folders such as1vs1
,1vs5
,1vs10
and1vs20
. -
Folder
dlfuzz
contains two subfoldersall_inputs
andonly_valid_inputs
that contain all inputs generated by DLFuzz and only the valid inputs (i.e. the ones classified correctly by the original model) correspondingly. Both folders contain information for each mutant and run. As only the inputs inonly_valid_inputs
folder were used for our analysis, the files generated by DeepCrime and theoutput.csv
file are located only in this folder. -
Folder
leave_one_out_RQ4
contains information regarding the experiments conducted for RQ4. The folder contains subfolders for each of the 13 mutants. Each mutant subfolder contains the fileleave_one_out.csv
which reports overall information for each of the 10 runs. The first column in the file indicates whether the mutant got killed or not, the second column reports the number of DeepMetis generated inputs added to the initial test suite, the third and fourth column report p_value and effect size calculated by DeepCrime. For each mutant there are folders associated with each run that contain the fileleave_one_out_accuracy_dominant.csv
. The first column of this file reports accuracies of each 20 original models, while the second column reports accuracies of each 20 mutant models. -
File
statistical_test_results.xlsx
reports p-values, effect size and confidence intervals calculated when comparing DeepMetis to other input generation tools. -
File
raw_data.csv
contains raw data regarding all the test input generators used in the study. Each column name has the structure{mutation_operator}_{tool}_{I or MS}
. The columns finishing withI
indicate the number of inputs generated, while the columns finishing withMS
indicate the mutation score or whether the mutant was killed. Step 3 of the replication package indicates how this file can be generated automatically. -
File
summary.csv
contains MNIST data reported in Table 3 in the paper. Step 3 of the replication package indicates how this file can be generated automatically.
The numbers of runs and mutants can be set in the launcher main_launcher_examplerun.py
. The number of runs can be indicated by using parameter -run_num
. DeepMetis runs in 1vs5
mode by default. The number of used mutant instances can be indicated using the parameter -mutant_num
. For example, the following command will perform 3 runs of DeepMetis in 1vs10
mode:
python3 main_launcher_examplerun.py -run_num=3 -mutant_num=10