Skip to content

Latest commit

 

History

History

DeepMetis-MNIST

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DeepMetis-MNIST

General Information

This folder contains the application of DeepMetis to the handwritten digit classification problem. This tool is developed in Python on top of the DEAP evolutionary computation framework. It has been tested on a machine featuring an i7 processor, 16 GB of RAM, an Nvidia GeForce 940MX GPU with 2GB of memory, Ubuntu 18.04 (bionic) OS and python 3.6.

Follow the steps below to set up DeepMetis and validate its general functionality.

Step 1: Configure the environment

This step is to configure DeepMetis on our docker container. If you want to do it on a generic Ubuntu machine use the following instructions.

NOTE: the size of the Docker image is ~15 GBs

Pull our pre-configured Docker image for DeepMetis-MNIST:

docker pull p1ndsvin/ubuntu:artifactmetis

Run it by typing in the terminal the following commands:

docker run -it --rm p1ndsvin/ubuntu:artifactmetis

Step 2: Example Run

Run DeepMetis

Move to the DeepMetis-MNIST folder:

cd DeepMetis-MNIST

Use the following command to start a fast run of DeepMetis-MNIST:

NOTE: Before starting a run, ensure that you deleted or removed the folder named results in the DeepMetis-MNIST main folder

python3 main_launcher_examplerun.py

This command will perform a single run of DeepMetis for the mutant obtained by applying 'Add Weights Regularisation' operator to the MNIST model ('l1_l2' regularisation will be added to the first layer).

NOTE: properties.py contains the tool's configuration, i.e., you should edit this file to change its configuration. For example, if you want to run DeepMetis-MNIST for the number of iterations adopted in the paper, you need to set the NGEN variable in properties.py to the value 1000.

When the run ends, on the console you should see a message like the following:

Final solution N is: X
GAME OVER

Process finished with exit code 0

where X is the number of generated mutant-killing inputs.

Moreover, DeepMetis will create a folder results_mnist_add_weights_regularisation_mutated0_MP_l1_l2_0_1 which contains:

  • the archive of solutions (archive folder);
  • the final report (report_final.json);
  • the configuration's description (config.json).

Evaluate the augmented test set with DeepCrime

Once DeepMetis has generated inputs for the mutant, we check whether augmentation with these inputs makes the mutant killed. First we run DeepCrime with the initial test set, i.e. without adding the generated inputs.

cd ../deepcrime
python3 evaluate_metis.py -augment=no

At the end of the run you will see the output "Mutant killed: False". We then run DeepCrime augmenting it with DeepMetis generated inputs.

python3 evaluate_metis.py

In this case, if the augmented test set kills the mutant, you will see the output "Mutant killed: True" at the end of the run .

For more information on how to set up DeepCrime, use the following GitHub repo and Zenodo artifact:

https://github.com/dlfaults/deepcrime

https://zenodo.org/record/4772465

Step 3: Replicate the results in the paper

At this step we provide scripts to extract the data reported in the paper from our overall experimental data. All the experimental data is available in the folder experiments. We have excluded only the .npy files of the generated images due to their big size.

Run the following command to generate the MNIST data from Table 3 in the paper.

cd ../DeepMetis-MNIST/experiment/
python3 replicate_table3.py

The script outputs the latex code for Table 3. This information is also stored in the file summary.csv. In addition, it generates the file raw_data.csv that provides information about each of 10 runs for each mutant.

Run the following command to generate the MNIST data from Table 4 in the paper.

python3 replicate_table4.py

The script outputs the latex code for Table 4.

Reuse DeepMetis

Run DeepMetis for any mutant

To run DeepMetis for any mutant used in our experiments, we first need mutations generated for MNIST by the DeepCrime tool. These mutations can be downloaded from the artifacts provided by the authors of DeepCrime paper at the following links:

https://zenodo.org/record/4737748
https://zenodo.org/record/4737754

The artifacts contain h5 files that names of which correspond to one of the 20 instances of a mutation operator run with a specific parameter. The names have the following structure:

{subject_name}_{mutation_operator}_MP_{parameter_value}_{instance_num}.h5

For example, mnist_add_noise_mutated0_MP_25.0_0.h5 corresponds to the first instance of the mutant generated by applying "Add Noise" operator to the 25% of the training data of MNIST. As noted before, each mutant has 20 instances.

In our replication package we provide models for only one mutant (mnist_add_weights_regularisation_mutated0_MP_l1_l2_0 used at Step 2) due to the large size of h5 files. To run DeepMetis for some other mutant, copy the h5 files of that mutant to the folder DeepMetis-MNIST/mutant_model. The number of instances of the mutant copied into this folder correspond to the setting of the DeepMetis that you want to use, i.e. if you copy 5 instances of the mutant then you will run DeepMetis in 1vs5 setting. Correspondingly, if you copy 10 instances then you will run DeepMetis in 1vs10 setting. Once the desired number of instances have been copied, run the following command:

python3 main_launcher.py

To apply DeepMetis to the mutants that were not used in our or DeepCrime's experiments, the user first needs to generate them. The instructions on how to generate mutants using DeepCrime are provided in the tool's own replication package available at the following link:

https://zenodo.org/record/4772465

Once the h5 files of the mutant are obtained, the process of running DeepMetis is the same, i.e. we need (as per above instructions) to copy h5 into corresponding folders and run main_launcher.py.

Explore all the data generated as part of the paper

We provide all the data collected during our experiments. The data in the folder DeepMetis-MNIST/experiment in this git repository as well as in the corresponding docker contains all the data except the images generated by the test input generators. We excluded the images due to the overall size. However, we have uploaded all the data including also images to Zenodo at the following link:

https://zenodo.org/record/5105742

The data related to MNIST case study is located in the MNIST.zip file of the Zenodo submission. Once this file is unzipped the folder MNIST will contain the following folders and files:

  1. Folder deepmetis which contains 4 subfolders deepmetis_1vs1, deepmetis_1vs5, deepmetis_1vs10, deepmetis_1vs20. The subfolders correspond to the setting with which DeepMetis was run (i.e. 1vs1, 1vs5, 1vs10 and 1vs20). Each of these subfolders contain 12 folders for each of the 12 mutants used in our study. Each mutant folder contains the file output.csv and 10 folders named from 0 to 9 that correspond to each of the 10 runs. The file output.csv contains overall information about all 10 runs, indicating for each of them the number of inputs generated in the second column. For the mutation operators with range-based parameters in the third column it reports the outcome of the binary search for the augmented test set. In contrast, for the mutation operators with non range-based parameters it indicates whether the mutant becomes killed once the test set is augmented. The folder for each run contains more detailed information such as the files generated by DeepCrime for each mutant. Moreover, it contains the folder results that stores the output of DeepMetis. The structure of the DeepMetis' output is explained at Step 2.

  2. Folder deepjanus has same structure as the folder deepmetis with the only difference being the absence of setting specific folders such as 1vs1, 1vs5, 1vs10 and 1vs20.

  3. Folder dlfuzz contains two subfolders all_inputs and only_valid_inputs that contain all inputs generated by DLFuzz and only the valid inputs (i.e. the ones classified correctly by the original model) correspondingly. Both folders contain information for each mutant and run. As only the inputs in only_valid_inputs folder were used for our analysis, the files generated by DeepCrime and the output.csv file are located only in this folder.

  4. Folder leave_one_out_RQ4 contains information regarding the experiments conducted for RQ4. The folder contains subfolders for each of the 13 mutants. Each mutant subfolder contains the file leave_one_out.csv which reports overall information for each of the 10 runs. The first column in the file indicates whether the mutant got killed or not, the second column reports the number of DeepMetis generated inputs added to the initial test suite, the third and fourth column report p_value and effect size calculated by DeepCrime. For each mutant there are folders associated with each run that contain the file leave_one_out_accuracy_dominant.csv. The first column of this file reports accuracies of each 20 original models, while the second column reports accuracies of each 20 mutant models.

  5. File statistical_test_results.xlsx reports p-values, effect size and confidence intervals calculated when comparing DeepMetis to other input generation tools.

  6. File raw_data.csv contains raw data regarding all the test input generators used in the study. Each column name has the structure {mutation_operator}_{tool}_{I or MS}. The columns finishing with I indicate the number of inputs generated, while the columns finishing with MS indicate the mutation score or whether the mutant was killed. Step 3 of the replication package indicates how this file can be generated automatically.

  7. File summary.csv contains MNIST data reported in Table 3 in the paper. Step 3 of the replication package indicates how this file can be generated automatically.

Configure a different number of runs and mutants

The numbers of runs and mutants can be set in the launcher main_launcher_examplerun.py. The number of runs can be indicated by using parameter -run_num. DeepMetis runs in 1vs5 mode by default. The number of used mutant instances can be indicated using the parameter -mutant_num. For example, the following command will perform 3 runs of DeepMetis in 1vs10 mode:

python3 main_launcher_examplerun.py -run_num=3 -mutant_num=10