We welcome:
- Bug reports
- Pull requests for bug fixes
- Logs and documentation improvements
- New algorithms and datasets
- Better hyperparameters (but with proofs)
Contributing code is done through standard github methods:
git clone git@github.com:tinkoff-ai/CORL.git
cd CORL
pip install -r requirements/requirements_dev.txt
- Fork this repo
- Make a change and commit your code
- Submit a pull request. It will be reviewed by maintainers, and they'll give feedback or make requests as applicable
The CI will run several checks on the new code pushed to the CORL repository. These checks can also be run locally without waiting for the CI by following the steps below:
- install
pre-commit
, - install the Git hooks by running
pre-commit install
.
Once those two steps are done, the Git hooks will be run automatically at every new commit.
The Git hooks can also be run manually with pre-commit run --all-files
, and
if needed they can be skipped (not recommended) with git commit --no-verify
.
We use Ruff as our main linter. If you want to see possible
problems before pre-commit, you can run ruff check --diff .
to see exact linter suggestions and future fixes.
All new algorithms should go to the algorithms/contrib/offline
for just
offline algorithms and to the algorithms/contrib/finetune
for the offline-to-online algorithms.
We as a team try to keep the core as reliable and reproducible as possible,
but we may not have the resources to support all future algorithms.
Therefore, this separation is necessary, as we cannot guarantee that all
algorithms from algorithms/contrib
exactly reproduce the results of their original publications.
Make sure your new code is properly documented and all references to the original implementations and papers are present (for example as in Decision Transformer). Please, explain all the tricks and possible differences from the original implementation in as much detail as possible. Keep in mind that this code may be used by other researchers. Make their lives easier!
While we welcome any algorithms, it is better to open an issue with the proposal before so we can discuss the details. Unfortunately, not all algorithms are equally easy to understand and reproduce. We may be able to give a couple of advices to you, or on the contrary warn you that this particular algorithm will require too much computational resources to fully reproduce the results, and it is better to do something else.
Although you will have to do a hyperparameter search while reproducing the algorithm,
in the end we expect to see final configs in configs/contrib/<algo_type>/<algo_name>/<dataset_name>.yaml
with the best hyperparameters for all
datasets considered. The configs should be in yaml
format, containing all hyperparameters sorted
in alphabetical order (see existing configs for an inspiration).
Use these conventions to name your runs in the configs:
name: <algo_name>
group: <algo_name>-<dataset_name>-multiseed-v0
, increment version if needed- use our __post_init__ implementation in your config dataclass
Since we are releasing wandb logs for all algorithms, you will need to submit multiseed (~4 seeds)
training runs the CORL
project in the wandb corl-team organization. We'll invite you there when the time will come.
We usually use wandb sweeps for this. You can use this example config (it will work with pyrallis as it expects config_path
cli argument):
# sweep_config.yaml
entity: corl-team
project: CORL
program: algorithms/contrib/<algo_name>.py
method: grid
parameters:
config_path:
# algo_type is offline or finetune (see sections above)
values: [
"configs/contrib/<algo_type>/<algo_name>/<dataset_name_1>.yaml",
"configs/contrib/<algo_type>/<algo_name>/<dataset_name_2>.yaml",
"configs/contrib/<algo_type>/<algo_name>/<dataset_name_3>.yaml",
]
train_seed:
values: [0, 1, 2, 3]
Then proceed as usual. Create wandb sweep with wandb sweep sweep_config.yaml
, then run agents with wandb agent <agent_id>
.
Based on the results, you will need to make wandb reports to make it easier for other users to understand. You can use any of the already existing ones as an example (see README.md).
- Issue about new algorithm is open
- Single-file implementation is added to the
algorithms/contrib
- PR has passed all the tests
- Evidence that implementation reproduces original results is provided
- Configs with the best hyperparameters for all datasets are added to the
configs/contrib
- Logs and reports for best hyperparameters are submitted to our wandb organization