Repository mining using PyDriller.
For portability and replicability of this tool, we use docker. For easier docker setup, we provide two scripts for building docker image and running the docker container.
! Note: For Windows, first install (and have it running) Docker for Windows. Then use Git Bash to run the following scripts.
Execute the following script for building the docker image:
. docker_scripts/build-cps-repo-mining.sh
The script docker_scripts/run-cps-repo-mining-container.sh
is created for this task.
For running the mining for remote repositories, this script can be executed without any input parameter.
However, to perform the mining process for local repositories, we should pass the directory of local repositories as the input argument:
. docker_scripts/run-cps-repo-mining-container.sh [local_repositories]
! Note: This input argument should be an absolute path.
There are multiple scripts that can be used.
Mining the repository on content from commit messages, based on an input list of keywords:
docker exec -it cps-repo-mining-container bash -c "python3 pd/repository_commits_mining.py [lx/rx/la/ra]"
The options are:
lx = local
rx = remote
la = all local
ra = all remote
Where x stands for the number in the repository list.
To run the repository_commits_mining
script in the local mode, we need to clone the projects into a directory. this directory will later be passded as an argument to the further scripts. To prepare such a directory, run the following script
python3 pd/clone_repos.py [a_pre_existing_directory_for_saving_the_local_repositories]
Change in the dict_repo_list.py
file the variable location_github
to match where on your system the repositories are located. This will be the general path used to find all the repositories described in the 'project' dictionary, located in the same file.
It is possible to manually overwrite the location for a specific repository by changing it in the projects
dictionary. Replace None
by the system path to the repository for the selected repository.
Searching through the commit diffs, from the above script result, for specific code snippet.
docker exec -it cps-repo-mining-container bash -c "python3 pd/search_selection.py"
Searching through the current state of the full repositories, for code snippets.
docker exec -it cps-repo-mining-container bash -c "python3 pd/search_current.py"
Run report_template_maker.py
to generate the template for the manual analysis.
When finished manually analysis (the selected repository), generate the graphs containing the results; run manual_analysis_report_parser.py
, followed by running initial-analysis.R
(in RStudio).
To generate coverage, mutation and docstring reports, run:
docker exec -it cps-repo-mining-container bash -c "./generate_reports"
Be aware that the location for the reports that are made for pdoc and mutmut end in the same directory.
Run pdoc with --force to run it anyway.
In order to run pytest:
pytest
To generate the pytest report:
pytest --cov-report html --cov=pd
In order to run mutmut:
mutmut run
To generate the mutmut report:
mutmut html
In order to run python docstring documentation report:
pdoc --html pdoc