We assume you have VeriFIT/smt-bench set up and running on an evaluation server where you run the experimental evaluation (according to its instructions).
-
Set up the Python virtual environment:
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt
-
Get new tasks from the server with experimental results (change host, port, etc. if running from a different server):
./get_tasks_and_generate_csv.sh
-
Process the results (choose one):
- Run the Jupyter evaluation notebook
eval.ipynb
:- Set the correct tools and benchmarks to evaluate (set the version of
NOODLER
). - Run the first 4 cells to load the benchmarks and other cells based on your need.
- Set the correct tools and benchmarks to evaluate (set the version of
- Only prepare the results for manual processing:
Store the processed results and evaluate them manually.
./pyco_proc.py [options] <requested_tasks_file_with_results.tasks>
- Exit the Python virtual environment:
deactivate