This subfolder contains the scripts and data extracted by our approach, without the machine learning part.
The following software is required to be installed
Although each folder has their own READMEs inside, here is a short explanation about which folder is used for what purpose:
BenchmarkVariabilities (in folder BenchmarkVariabilities
) transforms the raw execution data into the benchmark variabilities.
- takes as input the raw execution data from the projects (in
go_results/
) - filters the valid projects with
1-filter_valid_projects.py
and createsgo-results-valid/go-results-2-valid.csv
, - creates the input to the pa tool with
3-inputcsvcreator.py
found inpa_input_projects/
- creates the variability files for a number of iterations with pa by running
4-get_pa_results_iteration.py
. Files for 5, 10, and 20 iterations are namedfinal_5_iterations.csv
,final_10_iterations.csv
, andfinal_20_iterations.csv
, respectively.
Downloader script written in Python, takes "project_commit.csv" from BenchmarkVariabilites (run python BenchmarkVariabilities/project_commit.py
) to download according projects.
It creates the input for the feature extraction (in folder FeatureExtraction
).
FeatureExtraction (in folder FeatureExtraction
) extracts and combines features as described in the paper.
It includes
- a dependency fetcher for the downloaded projects (
cmd/deps/
), - the AST feature parser (
cmd/ast/
), and - the call graph feature combiner (
cmd/cg/
). How to use them are explained in its own README file with instrucstions on how to build and execute them. These take as input "project_commit_place.csv" from BenchmarkProjects.
MergeFiles (in folder MergeFiles
) merges individual output files from FeatureExtraction and BenchmarkVariabilities.
The reuslting, merged files include the features and the variabilties, which are the input for the machine learning part of our approach.