Thanks for your interest in contributing to ProteinCartography! Please read this document in its entirety before contributing to help ensure that your contribution meets our standards and is readily accepted.
All the packages needed to develop for ProteinCartography are found in the envs/cartography_dev.yml
conda environment.
You can install this environment as follows:
-
Make sure
miniconda
is installed. Even if you’re using an Apple Silicon (M1, M2, etc. macOS) laptop, you will need to install the macOS Intel x86-64 version ofminiconda
here. -
Create a conda environment from the
cartography_dev.yml
file in theenvs/
directory.
conda env create -n cartography_dev --file envs/cartography_dev.yml
- Activate the environment.
conda activate cartography_dev
We track all bugs, new feature requests, enhancements, etc. using GitHub Issues. Please check to make sure that your issue has not already been reported. If it has, please add a comment to the existing issue instead of creating a new one.
The steps below apply to both external and internal contributors and also apply to working both with this repo itself and with your own fork. However, if you are an external contributor, please fork this repository and make your changes in a new branch on your fork. Please see the GitHub docs on working with forks for more details about how to do this.
-
Whenever you start work, you should make sure to
git pull
on the main branch to get the latest version of themain
branch. -
When you’re planning on working on an issue, you should claim it by assigning it to yourself. You should also add a comment briefly explaining how you plan to address the issue. If you are an external contributor, please wait for a maintainer to sign off on your plan before you start working; this will make it easier for us to accept your PR later on.
-
After claiming an issue, create a new branch from the
main
branch. Make sure your branch begins with your initials, followed by a forward slash. This is very important to keep everyone's branches well-organized. Use short descriptive names for the branch name itself. For example, if your initials areabc
and you are adding a new feature to evaluate clustering, you might name your branchabc/add-cluster-evaluation
. If you are working on an issue to fix a bug, you might name your branchabc/fix-foldseek-format-bug
. Create the new branch using the following command:git checkout -b <your-initials>/<branch-name>
-
Once you’ve created a branch, push the branch to the GitHub repo so you can keep a remote record of your work.
git push -u origin <your-initials>/<branch-name>
-
Once you’ve completed the feature or fixed the bug, you are ready to open a PR. Please aim to keep the PRs as small as possible to increase the speed and ease of review; a PR can be as small as changing a few characters or resolving a single bug. When you open a PR, please use a succinct and human-readable title and always add a description of the changes you made as well as a link to the issue that the PR addresses.
-
Check that your PR is passing the CI checks. These checks verify that the changes in your PR are formatted correctly (we use a tool called
ruff
for formatting; see below for details) and also that your PR passes the automated tests of the pipeline.
Occasionally, your development branch will be behind the main branch. If this happens and you’re working on a file that was changed in the updated version of the main branch, you may need to merge the updated main
branch into your local development branch.
-
First, update the main branch in your local repo using the following in main:
git checkout main git pull
-
Next, check out your development branch:
git checkout <your-initials>/<branch-name>
-
Now, merge the main branch into your local branch:
git merge origin/main
Note that you must be on your local development branch when you call
git merge
! -
Once you’ve merged the main branch into your local development branch, use
git push
to push the merged changes to your branch on GitHub.
We use ruff
to lint and format our Python code. We also use snakemake’s code formatter, snakefmt
, to format the snakefiles. You can and should run these tools in your local repo using the commands make format
and make lint
. Note that ruff
is also available as an extension in VS Code, allowing you to configure VS Code to automatically format your code whenever you save the file you are currently editing.
Tests are found in the ProteinCartography/tests/
directory. We use pytest
for testing; you can run the tests locally using the command make test
. Currently, we only have integration-like tests that run the pipeline in both 'search' and 'cluster' modes using a test dataset and test config designed to allow the pipeline run very quickly (2-3min). The tests then check that the output files are created and have the correct shape. We plan to add unit tests in the future.
When the pipeline is run in 'search' mode, it makes many calls to external APIs (including Foldseek, Blast, and Alphafold). By default, these calls are mocked during testing so that the tests do not depend on external APIs; this is important to ensure test reproducibility and also helps to make the tests run quickly. However, it is important to periodically test that the pipeline also runs correctly when real API calls are made. To do this, you can run the tests without mocks using make test-without-mocks
.
When merging PRs on GitHub, it is likewise important to test that the pipeline runs correctly with real API calls. To do so, add the label run-slow-tests
to your PR. This will trigger the CI actions (see below) to run again on your PR, but now without mocks. Please add this label only when your PR is ready to merge, as it will cause the CI to run more slowly and will also result in unnecessary API calls.
When changes you have made involve changes to the API calls made by the pipeline, it will be necessary to update the mocked responses in order for the tests to pass. Currently, this is a manual process.
-
Enable API response logging in your local environment by setting the following environment variable:
export PROTEINCARTOGRAPHY_SHOULD_LOG_API_REQUESTS=true
-
Run the pipeline in 'search' mode using the 'small' search-mode demo (this demo uses the same input PDB file as the tests). The API responses made by the pipeline will be logged a
logs/
directory in the root of the repo.snakemake --cores all --use-conda --configfile demo/search-mode/config_actin_small.yml
-
Use the logged responses to update the mocked responses constructed in the
ProteinCartography/tests/mocks.py
module. For large responses, response contents should be added toProteinCartography/tests/integration-test-artifacts/search-mode/actin/api_response_content/
. -
When you're finished, don't forget to delete the
logs/
directory and unset thePROTEINCARTOGRAPHY_SHOULD_LOG_API_REQUESTS
environment variable.
We use GitHub Actions for CI. Currently, there is one workflow for linting and one for testing. Both workflows are run automatically on GitHub when a PR is opened and also whenever new commits are pushed to an open PR. PRs cannot be merged until the CI checks pass.
The linting workflow runs ruff --check
and snakefmt --check
on all Python and snakefiles in the repo. This means that the workflow does not modify any files; it only checks that your code is formatted correctly. If the workflow fails for your PR, you can run make format
locally to format your code and make lint
to determine if there are lint errors that need to be fixed.
The testing workflow runs pytest using the same make test
command that is used locally. If the workflow fails for your PR, it is usually best to run make test
locally to recapitulate the failure and determine which tests are failing.
In addition to the formatting and lint rules imposed by ruff
and snakefmt
, we also have a few additional style rules that are not enforced by these tools. These rules are listed below.
- Function and variable names should be in
lower_snake_case
and should be descriptive; avoid abbreviations. - Function arguments and return values should have type hints.
- Functions should include a Google-style docstring explaining the arguments and what the function returns (if not
None
). - Comments should be written in complete sentences in the present tense and should end with a period.
- Comments should be used sparingly and only when necessary to explain something that is not obvious from the code itself.
- Class names should use
CapitalizedCamelCase
with descriptive names. - Currently, we don’t use many custom classes, but the conventions for functions apply to class methods as well.
Here is an example of a function that adheres to all of these rules:
def add_integers(first_integer: int, second_integer: int) -> int:
"""
Add two integers together, returning the result.
Args:
first_integer (int): first integer to add.
second_integer (int): second integer to add.
Returns:
The sum of the two integers.
"""
result = first_integer + second_integer
return result
We strive to encapsulate new functionality within modular Python scripts that accept arguments from the command line using argparse
. These scripts are then called from snakemake rules and can also be run directly from the command line by the user.
- Every script should include a
parse_args()
function and amain()
function. - Every script with
#!/usr/bin/env python
(so that the scripts are executable from the command line on unix systems). - An example template for new scripts is found in
template.py
.
First, please consider carefully whether you need to add a new dependency to the project.
When changes you have made absolutely require new dependencies, please make sure that they are conda
-installable.
Dependencies should be added to two environment files:
- the
cartography_dev.yml
file in theenvs/
directory. - the appropriate snakemake rule environment file in the
envs/
directory.
In both files, please include the version of the dependency you are using (this is called "pinning" the dependency).
Include only the exact version number; do not include the package hash.
For example, if you are adding a new dependency called new_dependency
and you are using version 1.2.3
, you would add the following line to the cartography_dev.yml
file:
- new_dependency=1.2.3
See how we recognize feedback and contributions to our code at Arcadia here.