VariantWorks is a framework to enable the development of Deep Learning based genomic read processing tasks such as variant calling, consensus calling, etc. It provides a library of data encoding and parsing functions commonly applicable to read processing, along with a simple way to plug them into a Deep Learning pipeline.
For the Deep Learning pipeline, VariantWorks leverages the NeMo framework which provdes an easy-to-use, graph based representation of high level computation graphs.
The target audience for VariantWorks is the following -
- Variant Caller developers - for existing developers in the variant calling community, VariantWorks intends to provide a convenient way to start designing variant callers built using Deep Learning.
- Deep Learning practitioners - for existing deep learning practitioners, VariantWorks can lower the barrier to applying novel Deep Learning techniques to the field of genomic variant calling.
- Encoders - Pre-written, commonly used (and in the future, optimized) encoders for reads.
- I/O - Readers and writers for common genomics file formats.
- Reference Models - Collection of neural network architectures well suited for variant calling.
- Python 3.7+
- NVIDIA GPU (Pascal+ architecture)
- NVIDIA Apex library (for multi-GPU training in supported pipelines)
- Install latest development code from source
git clone --recursive https://github.com/clara-parabricks/VariantWorks.git
cd VariantWorks
pip install -r python-style-requirements.txt
pip install -r requirements.txt
pip install -e .
# Install pre-push hooks to run tests
ln -nfs $(readlink -f hooks/pre-push) .git/hooks/pre-push