Developer Notes

General contribution guide

In general, modelforge uses the Fork and Pull approach for contributions. Briefly, fork the project, create a branch within that fork, push changes to that branch, and create a pull request in the main repository when ready. This will help to avoid the main repository less cluttered in terms of branches.

Before contributing changes or additions to modelforge, first open an issue on github to discuss the proposed changes. This will help ensure that the changes are in line with the project's goals and that they are implemented in a way that is consistent with the rest of the codebase. This will also ensure that such proposed changes are not already being worked on by another contributor.

When contributing to modelforge, please follow the guidelines below:

Open an issue on github to discuss the proposed changes before starting work.
Fork the project and create a new branch for your changes.
Use the PEP8 style guide for Python code.
Use Black for code formatting.
Use Python type hints to improve code readability and maintainability.
Include docstrings formatted using the numpydoc style.
Include unit tests for new code, including both positive and negative tests when applicable, and test edge cases.
Include/update documentation in the form of .rst files (in the docs folder).
Ensure that the code is well-commented.
If applicable, include examples of how to use the new code.
For more complex additions (e.g., new NNPs) create a new wiki page to explain the underlying algorithms.

Implementing unit tests and pytest

To speed up the CI, we use pytest-xdist for parallel execution of unit tests. Note, we only use parallel execution on the ubuntu runners, as there was no speed improvement on MacOS runners (and in some cases reduced performance). If pytest-xdist is not installed, pytest will just ignore any marks associated with pytest-xdist and execute as normal.

Usage is effective the same, but with a few additional parameters required:

pytest -n auto --dist loadgroup -v

Rather than "auto", we can also specify a target number of cores; specify 0 will run standard serial pytest. The "loadgroup" option will force the scheduler to run any operations in the same "group" sequentially (it forces tests in the same group to run on the same worker; other jobs can still run concurrently).

There are several important things to keep in mind when implementing tests to work for parallel execution.

Self Contained tests

Tests should be self-contained and not rely on the output of another test. pytest-xdist uses a scheduler to execute jobs as resources become available and thus the order that tests are executed may be different each time (e.g., the order may change if even a single test takes longer to execute than usual; we have many tests that require downloading files and those reading hdf5 files where only a single thread can read at a given time). Note, since GitHub CI only gives us 2 cores to work with, this may mask poorly constructed tests as there are fewer degrees of freedom. As such, when writing new tests, it is probably good practice to run locally with more than 2 threads. One could define a group for tests that require sequential operations, but likely better to design tests to avoid that if possible.

Use unique filenames/directories

Each python test file includes a pytest fixture to create a unique temporary directory to store data associated with the tests in that file. While creating a unique directory for each testing file may mean we duplicate work in some cases (e.g., downloading a dataset again), it prevents different testing files from accidentally outputting files with matching names (which could cause tests to fail, especially if executed in parallel); as such this prevents a developer from needing to consider what is done in a test file they are not working on. Note, each time we execute pytest, the fixture creates a unique temporary directory, avoiding any issue with "old" data.

However, this does not guarantee we do not have conflicts within a given testing file. To remedy this, it is best to avoid using "generic" filename like "output.pt" and instead use unique names that include information about the specific test, e.g., "ANI2x_test_saving.pt". This is especially relevant when using pytest parameterize to run the same test multiple times with different inputs, as those unique inputs may be executed concurrently.

Note: In modelforge, file download functions and non-thread safe reading operations (e.g., HDF5 files) use file locking to avoid issues. Specifically, we have a file locking a context manager class that will wait until the file lock has been removed, before the code execution proceeds (as such the code will not exit if a file lock is encountered). This should avoid any possible issues with multiple threads trying to download the same test file or accessing the same hdf5 file. However, there may still be some cases where conflicts could arise due to creation of the npz file (this will be addressed in a future PR). Generally speaking, most classes in modelforge that accept the "local_cache_dir" parameter as input will create the directory if it is not found. As such, one can append to the prep_temp_dir path to create a unique subdirectory (e.g., local_cache_dir = f"{str(prep_temp_dir}/{potential_name}_model_factory_test") to create a unique subdirectory for the test, without having to add a new fixture.

Serial execution of testing groups

In modelforge, our tests for performing training with pytorch lightning can be very memory intensive. On GitHub, we do not want those to be running concurrently, as the would cause the CI to exit due running out of memory. To avoid this, we can group these tests (by defining an xdist_group), which tells the scheduler to only execute these jobs on the same worker (which forces them to be run serially). Note, this does not mean that other tests will not continue to run in parallel, it only forces the test in the group to run sequentially. This could result in the CI exiting if another memory intensive test tries to start at the same time. Note, on GitHub CI runners, I ran into this issue when setting the number of threads from 2 to 8, as it meant we had 7 other tests running while trying to also execute the memory intensive training tests.

To create a group, we just set a pytest mark:

@pytest.mark.xdist_group(name="test_training_with_lightning")

and use --dist loadgroup when executing pytest.

Pytest-xdist benchmarks.

To explore the behavior and relative speed improvements of pytest-xdist, several benchmarks were performed on a Intel Core i9 machine with 16 physical cores 64 gb ram, running Ubuntu using commit: https://github.com/choderalab/modelforge/tree/7b92c63d49446de6eb01605ca6c8c8ae0bb2be72

These tests were run first excluding the tests in test_training.py:

standard pytest : 27 min 49 s
1 thread: 22 min 8 s
2 threads: 13 min 34 s
4 threads: 7 min 57 s
8 threads: 4 min 48 s
16 threads : 3 min 26 s
32 threads : 2 min 58 s

While the scaling falls off as the number of threads increases, time to execute these tests can be reduced by almost a factor of 10.

Next, benchmarks were performed considering only the tests in test_training.py. Note this was executed without -dist loadgroup, and thus tests in test_training.py are run concurrently. Any more than 8 threads and the system ran out of memory.

original pytest: 18 min 40 sec
1 thread : 20 min 15 secs
2 threads: 10 min 53 sec
4 threads: 9 min 20 sec
8 threads : 9 min 17 sec

Significant speed up is found going from a single to 2 threads, but performance gains are minimal beyond that.

Running all tests using 8 threads required 12 min and 38 sec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly