A command-line interface for interacting with the SWE-bench API. Use this tool to submit predictions, manage runs, and retrieve evaluation reports.
Read the full documentation here.
pip install sb-cli
Before using the CLI, you'll need to get an API key:
- Generate an API key:
sb-cli gen-api-key your.email@example.com
- Set your API key as an environment variable - and store it somewhere safe!
export SWEBENCH_API_KEY=your_api_key
# or add export SWEBENCH_API_KEY=your_api_key to your .*rc file
- You'll receive an email with a verification code. Verify your API key:
sb-cli verify-api-key YOUR_VERIFICATION_CODE
SWE-bench has different subsets and splits available:
swe-bench-m
: The main datasetswe-bench_lite
: A smaller subset for testing and developmentswe-bench_verified
: 500 verified problems from SWE-bench Learn more
dev
: Development/validation splittest
: Test split (currently only available forswe-bench_lite
andswe-bench_verified
)
You'll need to specify both a subset and split for most commands.
Submit your model's predictions to SWE-bench:
sb-cli submit swe-bench-m test \
--predictions_path predictions.json \
--run_id my_run_id
Options:
--run_id
: ID of the run to submit predictions for (optional, defaults to the name of the parent directory of the predictions file)--instance_ids
: Comma-separated list of specific instance IDs to submit (optional)--output_dir
: Directory to save report files (default: sb-cli-reports)--overwrite
: Overwrite existing report (default: 0)--gen_report
: Generate a report after evaluation is complete (default: 1)
Retrieve evaluation results for a specific run:
sb-cli get-report swe-bench-m dev my_run_id -o ./reports
View all your existing run IDs for a specific subset and split:
sb-cli list-runs swe-bench-m dev
Your predictions file should be a JSON file in one of these formats:
{
"instance_id_1": {
"model_patch": "...",
"model_name_or_path": "..."
},
"instance_id_2": {
"model_patch": "...",
"model_name_or_path": "..."
}
}
Or as a list:
[
{
"instance_id": "instance_id_1",
"model_patch": "...",
"model_name_or_path": "..."
},
{
"instance_id": "instance_id_2",
"model_patch": "...",
"model_name_or_path": "..."
}
]