Skip to content

Commit

Permalink
Major update 1.1.0 (#18)
Browse files Browse the repository at this point in the history
  • Loading branch information
shitohana authored Jul 15, 2024
1 parent 2a2fe71 commit 35415cd
Show file tree
Hide file tree
Showing 42 changed files with 5,198 additions and 3,200 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Deploy Sphinx documentation to Pages

on:
push:
branches: [master, docs] # branch to trigger deployment
branches: [master, docs, new_docs] # branch to trigger deployment

jobs:
pages:
Expand Down
169 changes: 169 additions & 0 deletions docs/_bam.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
BAM
===


.. currentmodule:: bsxplorer

^^^^^^^^^
Reference
^^^^^^^^^

.. autosummary::
:toctree: _BAM
:nosignatures:
:template: class_with_init.rst

BAMReader
PivotRegion


^^^^^
Usage
^^^^^

BSXplorer offers :class:`BAMReader` to convert BAM files into methylation
reports or calculate such methylation statistics as methylation entropy [1],
epipolymorphism [2] or PDR [3].

.. note::
Input BAM file must be **sorted** and have index .bai file.

-----------
Report mode
-----------

For reading BAM files and converting them into methylation report format
set :class:`BAMReader` use :func:`BAMReader.report_iter`. This method returns
iterator over BAM file and yields data batches in :class:`UniversalBatch` format.

.. code-block:: python
import bsxplorer as bsx
reader = bsx.BAMReader(
bam_filename="path/to/bamfile.bam",
index_filename="path/to/bamfile.bam.bai",
)
# Simple iteration over BAM file
for batch in reader.report_iter():
do_something(batch)
# Batch can be saved into file
with bsx.UniversalWriter("path/to/output.txt", report_type="bismark") as writer:
for batch in reader.report_iter():
writer.write(batch)
For full cytosine methylation report output, cytosine data file needs to be
prepared with :func:`SequenceFile.preprocess_cytosines`. Path to prepocessed
cytosine file should be specified with `cytosine_file` parameter in
:class:`BAMReader` constructor.

.. code-block:: python
cytosine_filepath = "path/to/cytosines.parquet"
bsx.SequenceFile("path/to/sequence.fasta").preprocess_cytosines(cytosine_filepath)
reader = bsx.BAMReader(
bam_filename="path/to/bamfile.bam",
index_filename="path/to/bamfile.bam.bai",
cytosine_file=cytosine_filepath
)
If there is no need to analyze full BAM file, but only some genomic regions,
user can specify the `regions` parameter in :class:`BAMReader` constructor
with DataFrame generated with :class:`Genome` methods.

.. code-block:: python
regions = bsx.Genome.from_gff("path/to/annot.gff").gene_body(min_length=2000)
reader = bsx.BAMReader(
bam_filename="path/to/bamfile.bam",
index_filename="path/to/bamfile.bam.bai",
regions=regions
)
for batch in reader:
# The reader will return batch for each region individually
...
.. note::
If genomic regions in annotation file overlap, resulting batch data will be
duplicated.

For QC calculation, set qc parameter to True. After reading BAM file QC stats
can be plotted with :func:`BAMReader.plot_qc` method.

Resulting image:

.. image:: images/bam/QC_example.png
:width: 600

----------
Stats mode
----------

In "stats" mode :class:`BSXplorer` can calculate such methylation statistics
as:

* Methylation entropy [1]_ which is calculated as the entropy of epialleles
originating from a single genomic locus.

.. math::
ME = \frac{1}{L}\sum_i{-\frac{n_i}{N}\cdot Log\frac{n_i}{N}}
:math:`L` – window length, :math:`n_i` – number of pattern occurencies in
window, :math:`N` – total number of different methylation patterns in window.

* Epipolymorphism [2]_ also captures the amount of heterogeneity of DNA methylation.

.. math::
EPM = 1 - \sum_i{(\frac{n_i}{N})^2}
:math:`n_i` – number of pattern occurencies in window, :math:`N` – total number
of different methylation patterns in window.

* PDR (Proportion of discordant reads) [3]_ – a fraction of reads carrying
cytosines in discordant methylation states (i.e. containing both methylated and
unmethylated cytosines in a single read) with respect to all reads
mapped to a cytosines.

.. math::
PDR = \frac{d}{c+d}
:math:`d` – number of discordant reads, :math:`c` – number of concordant reads.

To calculate these methylation statistics, method :func:`BAMReader.stats_iter` need to be
called to get the iterator. Iterator returns :class:`PivotRegion`. Methods
:func:`PivotRegion.methylation_entropy`, :func:`PivotRegion.epipolymorphism` and
:func:`PivotRegion.PDR` can be called to get ME, EPM and PDR values. See documentation
for detailed explanation of these methods.

Example for ME:

.. code-block:: python
reader = bsx.BAMReader(
bam_filename="path/to/bamfile.bam",
index_filename="path/to/bamfile.bam.bai",
)
for batch in reader.stats_iter():
positions, value = batch.methylation_entropy()
.. [1] Xie, H., Wang, M., de Andrade, A., Bonaldo, M.deF., Galat, V., Arndt,
K., Rajaram, V., Goldman, S., Tomita, T., & Soares, M. B. (2011).
Genome-wide quantitative assessment of variation in DNA methylation patterns.
Nucleic acids research, 39(10), 4099–4108. https://doi.org/10.1093/nar/gkr017
.. [2] Landan, G., Cohen, N. M., Mukamel, Z., Bar, A., Molchadsky, A., Brosh,
R., Horn-Saban, S., Zalcenstein, D. A., Goldfinger, N., Zundelevich,
A., Gal-Yam, E. N., Rotter, V., & Tanay, A. (2012).
Epigenetic polymorphism and the stochastic formation of differentially
methylated regions in normal and cancerous tissues. Nature genetics,
44(11), 1207–1214. https://doi.org/10.1038/ng.2442
.. [3] Lee D, Koo B, Yang J, Kim S (2023)
Metheor: Ultrafast DNA methylation heterogeneity calculation from bisulfite
read alignments. PLoS Comput Biol 19(3): e1010946.
https://doi.org/10.1371/journal.pcbi.1010946
50 changes: 47 additions & 3 deletions docs/_binom.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,58 @@
Binomial
========

Methods for calculating P-value for cytosine residues or genomic regions.

.. currentmodule:: bsxplorer

^^^^^^^^^
Reference
^^^^^^^^^

.. autosummary::
:nosignatures:
:toctree: _Binom
:template: class.rst

BinomialData
RegionStat
BinomialData

^^^^^
Usage
^^^^^

BSXplorer offers methods to define cytosine methylation status. Assuming the distribution of
count of the methylated cytosine reads is binomial, BSXplorer runs a binomial test to
calculate cytosine methylation p-value. :func:`BinomialData.from_report` reads the methylation
report and saves cytosine methylation p-value into .parquet file for further usage.

.. code-block:: python
import bsxplorer as bsx
binom = bsx.BinomialData.from_report(
file="/path/to/report.txt",
report_type="bismark",
save="savename",
dir="/path/to/work_dir"
)
# class BinomialData can also be initialized from preprocessed file directly
# binom = bsx.BinomialData("path/to/preprocessed.parquet")
Preprocessed methylation p-value report can be used as input report for :class:`Metagene` or
:class:`ChrLevels` or can be utilized for Gene Body Methylation (gBM) [1]_ epigenetic mark
categorization with :class:`RegionStat`.

.. code-block:: python
genome = bsx.Genome.from_gff("path/to/annot.gff").gene_body(min_length=0)
region_stat = binom.region_pvalue(genome, methylation_pvalue=.05)
bm, im, um = region_stat.categorise(context="CG", p_value=.05)
This way regions are categorised as BM (Body Methylation), IM (Intermediate Methylated)
and UM (Undermethylated).

.. [1] Takuno, S., & Gaut, B. S. (2013).
Gene body methylation is conserved between plant orthologs and is of evolutionary consequence.
Proceedings of the National Academy of Sciences of the United States of America, 110(5),
1797–1802. https://doi.org/10.1073/pnas.1215380110
31 changes: 15 additions & 16 deletions docs/_chr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,29 @@ ChrLevels class.

.. class:: ChrLevels

.. rubric:: Methods

.. autosummary::
:nosignatures:
:toctree: _ChrLevels/_method
:template: method.rst

~ChrLevels.from_bismark


~ChrLevels.from_cgmap
.. automethod:: __init__

.. rubric:: Methods

~ChrLevels.from_parquet
.. autosummary::
:nosignatures:
:toctree: _ChrLevels/_method
:template: method.rst

~ChrLevels.from_bismark

~ChrLevels.save_plot_rds
~ChrLevels.from_cgmap

~ChrLevels.from_bedGraph

~ChrLevels.filter
~ChrLevels.from_coverage

~ChrLevels.from_binom

~ChrLevels.draw_mpl
~ChrLevels.save_plot_rds

~ChrLevels.filter

~ChrLevels.draw_plotly
~ChrLevels.line_plot

~ChrLevels.box_plot
Loading

0 comments on commit 35415cd

Please sign in to comment.