ishare
is a Rust crate designed to facilitate the analysis of rare-variant
sharing and identity-by-descent (IBD) sharing.
Currently, it's in pre-alpha stage.
This crate introduces a data structure and its corresponding algorithm for the tabular encoding of rare variant genotype data. The tabular encoding leverages the sparsity of minor alleles at sites with low minor allele frequencies, effectively transforming a large site-oriented genotype matrix into a more compact, manipulable, and accessible data structure. As a result, this design enhances disk IO speeds, facilitates in-memory access to rare variants, and permits large-scale rare variant sharing analysis.
- Command Line Tool:
gtencode
- Python Package (work in progress):
isharepy
Included in this crate are the data structure and algorithms associated with IBD
sharing analysis. It serves as a reimplementation of functionalities present in
the Python package ibdutils
and the C++
library ibdtools
. Key updates
from the previous implementations include:
-
Use of genome-wide base-pair coordinates for IBD segment start and end fields, as opposed to chromosomal positions. This facilitates easier IBD segment manipulation across chromosomes.
-
The
itervaltree
crate has been integrated to improve various IBD processing and analysis functionalities, enhancing code maintainability and readability. -
Command Line Tool:
ibdutils
-
Python Package (work in progress):
isharepy
for IBD-related classes.
We've introduced a command line tool that integrates local ancestry information
from rfmix
v2 with IBD segments to derive
ancestry-specific IBD. Refer to the binary asibd
for this feature.
- Operating Systems: Supported on Linux and MacOS. Windows is currently not
supported due to compatibility issues with the
rust-htslib
dependency. - C Compiler: This crate relies on bindings to C libraries.
- Rust: Ensure
cargo
and the necessary toolchain are installed to compile this crate and its associated binary executables.
cargo build --release --bin gtencode
cargo build --release --bin ibdutils
cargo build --release --bin asibd
Post-compilation, binaries are located in the target/release/ directory.
- make sure
cargo
, the necessary toolchain, c compilerare, python 3 are installed - use
pip
to installmaturin
,numpy
andpyarrow
pip install maturin numpy pyarrow
- install
isharepy
cd ishare_py
maturin develop --release
cd ..
Instructions for each command line tool are as follows:
toolname --help
: Lists available subcommands. E.g.,gtencode --help
.toolname subcommand --help
: Displays usage for a specific subcommand. E.g.,gtencode encode --help
.
Running tests requires cloning test data from https://github.com/bguo068/testdata
,
which has been added as a submodule of ishare
and thus can downloaded by running:
git submodule update --init --recursive