bam2tensor

bam2tensor is a Python package for converting .bam files to dense representations of methylation data (as .npz NumPy arrays). It is designed to evaluate all CpG sites and store methylation states for loading into other deep learning pipelines.

Features

Parses .bam files using pysam
Extracts methylation data from all CpG sites
Supports any genome (Hg38, T2T-CHM13, mm10, etc.)
Stores data in sparse format (COO matrix) for efficient loading
Exports methylation data to .npz NumPy arrays
Easily parallelizable

Requirements

Python 3.9+
pysam, numpy, scipy, tqdm

Installation

You can install bam2tensor via pip from PyPI:

pip install bam2tensor

Usage

Please see the Reference Guide for full details.

Data Structure

One .npz file is generated for each separate .bam, which can be loaded using scipy.sparse.load_npz(). Each .npz file contains a single sparse SciPy COO matrix.

In the COO matrix, each row represents a read and each column represents a CpG site. The value at each row/column is the methylation state (0 = unmethylated, 1 = methylated, -1 = no data). Note that -1 can represent indels or point mutations.

Todo

Consider storing a Read ID: Row ID mapping?
Export / more stably store & import embedding mapping? (.npz or other instead of .json?)
Store metadata / object reference in .npz file?

Contributing

Contributions are welcome! Please see the Contributor Guide.

License

Distributed under the terms of the MIT license, bam2tensor is free and open source.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project is developed and maintained by Nick Semenkovich (@semenko), as part of the Medical College of Wisconsin's Data Science Institute.

This project was generated from Statistics Norway's SSB PyPI Template.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github		.github
docs		docs
src/bam2tensor		src/bam2tensor
tests		tests
.cookiecutter.json		.cookiecutter.json
.cruft.json		.cruft.json
.darglint		.darglint
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
noxfile.py		noxfile.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bam2tensor

Features

Requirements

Installation

Usage

Data Structure

Todo

Contributing

License

Issues

Credits

About

Releases 8

Packages

Contributors 2

Languages

License

mcwdsi/bam2tensor

Folders and files

Latest commit

History

Repository files navigation

bam2tensor

Features

Requirements

Installation

Usage

Data Structure

Todo

Contributing

License

Issues

Credits

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 2

Languages

Packages