Chaos Toolkit extension for Reliably.
To be used from your experiment, this package must be installed in the Python environment where chaostoolkit already lives.
$ pip install chaostoolkit-reliably
To use this package, you must create have registered with Reliably services.
Then you need to set some environment variables as secrets.
RELIABLY_TOKEN
: the token to authenticate against Reliably's APIRELIABLY_HOST:
: the hostname to connect to, default toapp.reliably.com
{
"secrets": {
"reliably": {
"token": {
"type": "env",
"key": "RELIABLY_TOKEN"
},
"host": {
"type": "env",
"key": "RELIABLY_HOST",
"default": "app.reliably.com"
}
}
}
}
This extensions offers a variety of probes and tolerances ready to be used in your steady-state blocks.
For instance:
{
"version": "1.0.0",
"title": "SLO error-count-3h / Error budget 10%",
"description": "Monitor the health of our demo service from our users perspective and ensure they have a high-quality experience",
"runtime": {
"hypothesis": {
"strategy": "after-method-only"
}
},
"steady-state-hypothesis": {
"title": "Compute SLO and validate its Error Budget with our target",
"probes": [
{
"type": "probe",
"name": "get-slo",
"tolerance": {
"type": "probe",
"name": "there-should-be-error-budget-left",
"provider": {
"type": "python",
"module": "chaosreliably.activities.slo.tolerances",
"func": "has_error_budget_left",
"arguments": {
"name": "cloudrun-service-availability"
}
}
},
"provider": {
"type": "python",
"module": "chaosreliably.activities.slo.probes",
"func": "compute_slo",
"arguments": {
"slo": {
"apiVersion": "sre.google.com/v2",
"kind": "ServiceLevelObjective",
"metadata": {
"name": "cloudrun-service-availability",
"labels": {
"service_name": "cloudrun",
"feature_name": "service",
"slo_name": "availability"
}
},
"spec": {
"description": "Availability of Cloud Run service",
"backend": "cloud_monitoring_mql",
"method": "good_bad_ratio",
"exporters": [
],
"service_level_indicator": {
"filter_good": "fetch cloud_run_revision | metric 'run.googleapis.com/request_count' | filter resource.project_id == '${CLOUDRUN_PROJECT_ID}' | filter resource.service_name == '${CLOUDRUN_SERVICE_NAME}' | filter metric.response_code_class == '2xx'",
"filter_valid": "fetch cloud_run_revision | metric 'run.googleapis.com/request_count' | filter resource.project_id == '${CLOUDRUN_PROJECT_ID}' | filter resource.service_name == '${CLOUDRUN_SERVICE_NAME}'"
},
"goal": 0.9
}
},
"config": {
"backends": {
"cloud_monitoring_mql": {
"project_id": "${STACKDRIVER_HOST_PROJECT_ID}"
}
},
"error_budget_policies": {
"default": {
"steps": [
{
"name": "3 hours",
"burn_rate_threshold": 9,
"alert": false,
"window": 10800,
"message_alert": "Page the SRE team to defend the SLO",
"message_ok": "Last 3 hours on track"
}
]
}
}
}
}
}
}
]
},
"method": [
{
"name": "inject-traffic-into-endpoint",
"type": "action",
"background": true,
"provider": {
"func": "inject_gradual_traffic_into_endpoint",
"type": "python",
"module": "chaosreliably.activities.load.actions",
"arguments": {
"endpoint": "${ENDPOINT}",
"step_duration": 30,
"test_duration": 300,
"step_additional_vu": 3,
"vu_per_second_rate": 1,
"results_json_filepath": "./load-test-results.json"
}
}
}
]
}
This above example will get the last 5 Objective Results for our Must be good
SLO and determine if they were all okay or whether we've spent our error budget
they are allowed.
You can use controls provided by chaostoolkit-reliably
to track your experiments
within Reliably. The block is inserted automatically by Reliably when you
import the experiment into Reliably.
From a code perspective, if you wish to contribute, you will need to run a
Python 3.6+ environment. Please, fork this project, write unit tests to cover
the proposed changes, implement the changes, ensure they meet the formatting
standards set out by black
, ruff
, isort
, and mypy
, add an entry into
CHANGELOG.md
, and then raise a PR to the repository for review
Please refer to the formatting section for more information on the formatting standards.
The Chaos Toolkit projects require all contributors must sign a Developer Certificate of Origin on each commit they would like to merge into the master branch of the repository. Please, make sure you can abide by the rules of the DCO before submitting a PR.
If you wish to develop on this project, make sure to install the development dependencies. First you will need to install globally pdm and create a virtual environment:
$ pdm create venv
$ pdm use
$ $(pdm venv activate)
Then install the dependencies:
$ pdm sync -d
To run the tests for the project execute the following:
$ pdm run test
We use a combination of black
, [ruff
][flake8], isort
,
mypy
and [bandit
][] to both lint and format this repositories code.
Before raising a Pull Request, we recommend you run formatting against your code with:
$ pmd run format
This will automatically format any code that doesn't adhere to the formatting standards.
As some things are not picked up by the formatting, we also recommend you run:
$ pdm run lint
To ensure that any unused import statements/strings that are too long, etc.
are also picked up. It will also provide you with any errors mypy
picks up.