Skip to content

Commit

Permalink
Plotly (#8)
Browse files Browse the repository at this point in the history
* Clusters added into master (#6)

* flank regions fix

* new labels

* draw to cluster

* __init__.py Clustering added
BismarkPlot.py Clustering done and confidence bands
console_metagene.py confidence bands added
index.rst new classes and methods added in documentation

* documentation fix

* documentation fix

* v1.2.2 release

* v1.2.2 release

* v1.2.3 release

* v1.2.3 release

* v1.3

* console_chrs.py fixed path
BismarkPlot.py Genome.__trim_genes take flank length anyway
LinePlot added all contexts support

* console_metagene.py empty genome check

* BismarkPlot.py cpu check
console_metagene.py and console_chrs.py file_format fix

* v1.3.0a

* BismarkPlot.py fix cpu_count
console fix dpi, plot types

* v1.3.0a1

* v1.3.0b0

* v1.3.1

* Update README.md

* Update README.md

* Update README.md

* add_flank_lines for heatmap into method
added Modules class
fix for neat_TSS intersection mode
added genes IDs into genome

* v1.3.2a

* v1.3.2pr

* v1.3.2rc0

* v1.3.2

* 1. main file split into parts
2. added different summary stats
3. started adding plotly

* 1. main file split into parts
2. added different summary stats

* custom labels

* plotly wip

* added seq_reader

* - major cleanup
- Moved to ArrowReaders.py
- Added PlotBase to Base.py
- Added more constructors to Metagene
- Added schema check for Metagene

* some fixes... more are coming 4 sure

* P_CG
and A LOT of documentation

* first merge to main
  • Loading branch information
shitohana authored Dec 29, 2023
1 parent 5582904 commit f587a63
Show file tree
Hide file tree
Showing 49 changed files with 5,306 additions and 1,647 deletions.
16 changes: 7 additions & 9 deletions .github/workflows/publish.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
name: Publish to PyPI and TestPyPI

on:
push:
branches: [master] # branch to trigger deployment
on: push

jobs:
build-n-publish:
Expand All @@ -24,12 +22,12 @@ jobs:
run: >-
python -m
build
- name: Publish distribution 📦 to Test PyPI
if: startsWith(github.ref, 'refs/tags')
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository-url: https://test.pypi.org/legacy/
# - name: Publish distribution 📦 to Test PyPI
# if: startsWith(github.ref, 'refs/tags')
# uses: pypa/gh-action-pypi-publish@release/v1
# with:
# password: ${{ secrets.TEST_PYPI_API_TOKEN }}
# repository-url: https://test.pypi.org/legacy/
- name: Publish distribution 📦 to PyPI
if: startsWith(github.ref, 'refs/tags')
uses: pypa/gh-action-pypi-publish@release/v1
Expand Down
200 changes: 142 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# BismarkPlot
Comprehensive tool for visualizing genome-wide cytosine data.

Analytical framework for BS-seq data comparison and visualization.

See the docs: https://shitohana.github.io/BismarkPlot

Expand All @@ -15,15 +16,14 @@ pip install bismarkplot
You can use ```bismarkplot``` either as python library or directly from console after installing it.

Console options:
- *bismarkplot-metagene* - methylation density visualizing tool.
- *bismarkplot-chrs* - chromosome methylation levels visualizing tool.
- `bismarkplot-metagene` - methylation density visualizing tool.
- `bismarkplot-chrs` - chromosome methylation levels visualizing tool.

### bismarkplot-metagene

```commandline
usage: BismarkPlot. [-h] [-o OUT] [-g GENOME] [-r {gene,exon,tss,tes}] [-b BATCH] [-c CORES] [-f FLENGTH] [-u UWINDOWS] [-d DWINDOWS] [-m MLENGTH]
[-w GWINDOWS] [--line] [--heatmap] [--box] [--violin] [-S SMOOTH] [-L LABELS [LABELS ...]] [-C CONFIDENCE] [-H H] [-V V] [--dpi DPI]
[-F {png,pdf,svg}]
usage: BismarkPlot. [-h] [-o NAME] [--dir DIR] [-g GENOME] [-r {gene,exon,tss,tes}] [-b BATCH] [-c CORES] [-f FLENGTH] [-u UWINDOWS] [-d DWINDOWS] [-m MLENGTH] [-w GWINDOWS] [--line] [--heatmap]
[--box] [--violin] [-S SMOOTH] [-L LABELS [LABELS ...]] [-C CONFIDENCE] [-H VRESOLUTION] [-V HRESOLUTION] [--dpi DPI] [-F {png,pdf,svg}]
filename [filename ...]
Metagene visualizing tool.
Expand All @@ -33,7 +33,8 @@ positional arguments:
optional arguments:
-h, --help show this help message and exit
-o OUT, --out OUT output base name (default: /Users/shitohana/Desktop/PycharmProjects/BismarkPlot)
-o NAME, --out NAME output base name (default: plot)
--dir DIR output dir (default: /Users/shitohana/PycharmProjects/BismarkPlot_test)
-g GENOME, --genome GENOME
path to GFF genome file (default: None)
-r {gene,exon,tss,tes}, --region {gene,exon,tss,tes}
Expand Down Expand Up @@ -62,8 +63,8 @@ optional arguments:
labels for plots (default: None)
-C CONFIDENCE, --confidence CONFIDENCE
probability for confidence bands for line-plot. 0 if disabled (default: 0)
-H H vertical resolution for heat-map (default: 100)
-V V vertical resolution for heat-map (default: 100)
-H VRESOLUTION vertical resolution for heat-map (default: 100)
-V HRESOLUTION vertical resolution for heat-map (default: 100)
--dpi DPI dpi of output plot (default: 200)
-F {png,pdf,svg}, --format {png,pdf,svg}
format of output plots (default: pdf)
Expand All @@ -80,25 +81,27 @@ bismarkplot-metagene -g path/to/genome.gff -r gene -f 2000 -m 4000 -u 500 -d 50
### bismarkplot-chrs

```commandline
usage: BismarkPlot [-h] [-o DIR] [-b N] [-c CORES] [-w N] [-m N] [-S FLOAT] [-F {png,pdf,svg}] path/to/txt [path/to/txt ...]
usage: BismarkPlot [-h] [-o NAME] [-d DIR] [-b N] [-c CORES] [-w N] [-m N] [-S FLOAT] [-F {png,pdf,svg}] [--dpi DPI] path/to/txt
Chromosome methylation levels visualization.
positional arguments:
path/to/txt path to bismark methylation_extractor file
options:
optional arguments:
-h, --help show this help message and exit
-o DIR, --out DIR output base name (default: current/path)
-o NAME, --out NAME output base name (default: plot)
-d DIR, --dir DIR output dir (default: /Users/shitohana/PycharmProjects/BismarkPlot_test)
-b N, --batch N number of rows to be read from bismark file by batch (default: 1000000)
-c CORES, --cores CORES
number of cores to use (default: None)
-w N, --wlength N number of windows for genes (default: 100000)
-w N, --wlength N number of windows for chromosome (default: 100000)
-m N, --mlength N minimum chromosome length (default: 1000000)
-S FLOAT, --smooth FLOAT
windows for smoothing (0 - no smoothing, 1 - straight line (default: 50)
-F {png,pdf,svg}, --format {png,pdf,svg}
-F {png,pdf,svg}, --fmt {png,pdf,svg}
format of output plots (default: pdf)
--dpi DPI dpi of output plot (default: 200)
```

Example:
Expand All @@ -120,36 +123,110 @@ Below we will show the basic BismarkPlot workflow.
### Single sample

```python
import src.bismarkplot.Genome
import bismarkplot

# Firstly, we need to read the regions annotation (e.g. reference genome .gff)
genome = bismarkplot.Genome.from_gff("path/to/genome.gff")
genome = src.bismarkplot.genome.Genome.from_gff("path/to/genome.gff")
# Next we need to filter regions of interest from the genome
genes = genome.gene_body(min_length=4000, flank_length=2000)

# Now we need to calculate metagene data
metagene = bismarkplot.Metagene.from_file(
file = "path/to/CX_report.txt",
genome=genes, # filtered regions
upstream_windows = 500,
gene_windows = 1000,
downstream_windows = 500,
batch_size= 10**7 # number of lines to be read simultaneously
metagene = bismarkplot.Metagene.from_bismark(
file="path/to/CX_report.txt",
genome=genes, # filtered regions
up_windows=500,
body_windows=1000,
down_windows=500,
batch_size=10 ** 7 # number of lines to be read simultaneously
)

# Our metagene contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagene.filter(context = "CG", strand = "+")
filtered = metagene.filter(context="CG", strand="+")
# We are ready to plot
lp = filtered.line_plot() # line plot data
lp.draw().savefig("path/to/lp.pdf") # matplotlib.Figure
lp = filtered.line_plot() # line plot data
lp.draw_mpl().savefig("path/to/lp.pdf") # matplotlib.Figure

hm = filtered.heat_map(ncol=200, nrow=200)
hm.draw().savefig("path/to/hm.pdf") # matplotlib.Figure
hm.draw_mpl().savefig("path/to/hm.pdf") # matplotlib.Figure
```
Output:
Output for _Brachypodium distachyon_:

<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274546389-8b97edcb-6ab4-4f17-a970-389819fbeaec.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/274546419-079e004b-8f6e-4ce9-a3dc-fc4ad9592adc.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/280025496-b2336c72-5109-42d4-a770-f0a480ebf40d.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/280025490-c72da09a-7841-471a-bc39-086aa77f65e4.png" width="300">
</p>

If metagene is not filtered by context, **all available contexts will be plotted**:

```python
filtered_by_strand = metagene.filter(strand == "+")
lp = filtered_by_strand.line_plot()
lp.draw_mpl()
```

Output for _Brachypodium distachyon_:

<p align="middle">
<img width="300" src="https://user-images.githubusercontent.com/43905117/280023042-849599c1-4b36-47e2-8b8f-6c9b9389b48e.png">
</p>

**Confidence bands** can be visualized via setting the `confidence` parameter in `LinePlot.draw()`

```python
lp.draw_mpl(confidence=.95)
```

Output for _Brachypodium distachyon_:

<p align="middle">
<img width="300" src="https://user-images.githubusercontent.com/43905117/280023017-e1167a90-83d7-46d5-aa45-545d6bdbc033.png">
</p>

### Heat-map clusterisation

Genes can be clustered to minimize distances between them before plotting heat-map. This can be useful for capturing
overall methylation patterns in sample. _This operation is very time consuming. It is advised to set small number of
windows (< 50)_.

```python
metagene = bismarkplot.Metagene.from_bismark(
file="path/to/CX_report.txt",
genome=genes, # filtered regions
up_windows=5, body_windows=10, down_windows=5,
)
clustered = metagene.clustering(
count_threshold=5, # Minimum counts per window
dist_method="euclidean", # See scipy.spatial.distance.pdist
clust_method="average" # See scipy.cluster.hierarchy.linkage
)

# Heatmap with optimized distances between genes will be drawn
clustered.draw_mpl().savefig("path/to/clustered_hm.pdf")
```
Output for _Brachypodium distachyon_ - CHG

<p align="middle">
<img width="300" src="https://user-images.githubusercontent.com/43905117/282321746-baf97da1-6a35-4a17-9772-5a2f4d67e6a4.png">
</p>

### Genes dynamicTreeCut

To shrink clustered heat-map and capture main patterns genes can be split into modules using
[dynamicTreeCut algorithm](https://github.com/kylessmith/dynamicTreeCut/tree/master) by Peter Langfelder and Bin Zhang.
Then genes can be plotted as heat-map as previous example:

```python
# Parameters are the same as for cutreeHybrid (see dynamicTreeCut)
modules = clustered.modules(deepSplit=1)

modules.draw_mpl().savefig("path/to/modules_hm.pdf")
```

Output for _Brachypodium distachyon_ - CHG

<p align="middle">
<img width="300" src="https://user-images.githubusercontent.com/43905117/282321739-df4486f6-9b1f-467e-87ea-cfd441f80e0a.png">
</p>

### Smoothing the line plot
Expand All @@ -158,12 +235,12 @@ Smoothing is very useful, when input signal is very weak (e.g. mammalian non-CpG

```python
# mouse CHG methylation example
filtered = metagene.filter(context = "CHG", strand = "+")
lp.draw(smooth = 0).savefig("path/to/lp.pdf") # no smooth
lp.draw(smooth = 50).savefig("path/to/lp.pdf") # smoothed with window length = 50
filtered = metagene.filter(context="CHG", strand="+")
lp.draw_mpl(smooth=0).savefig("path/to/lp.pdf") # no smooth
lp.draw_mpl(smooth=50).savefig("path/to/lp.pdf") # smoothed with window length = 50
```

Output:
Output for _Mus musculus_:

<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274557328-5a087a43-5731-4cef-aa90-cf2ce046c747.png" width="300">
Expand All @@ -190,7 +267,7 @@ trimmed.violin_plot().savefig(...)
merged = filtered.merge()
```

Output:
Output for _Brachypodium distachyon_:

<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274546531-8516858a-8203-4e98-98a9-7351efb79d29.png" width="300">
Expand All @@ -205,17 +282,19 @@ Output:

```python
# For analyzing samples with different reference genomes, we need to initialize several genomes instances
import src.bismarkplot.Genome

genome_filenames = ["arabidopsis.gff", "brachypodium.gff", "cucumis.gff", "mus.gff"]
reports_filenames = ["arabidopsis.txt", "brachypodium.txt", "cucumis.txt", "mus.txt"]

genomes = [
bismarkplot.Genome.from_gff(file).gene_body(...) for file in genome_filenames
src.bismarkplot.genome.Genome.from_gff(file).gene_body(...) for file in genome_filenames
]

# Now we read reports
metagenes = []
for report, genome in zip(reports_filenames, genomes):
metagene = bismarkplot.Metagene(report, genome = genome, ...)
metagene = bismarkplot.Metagene(report, genome=genome, ...)
metagenes.append(metagene)

# Initialize MetageneFiles
Expand All @@ -240,26 +319,29 @@ Output:
Other genomic regions from .gff can be analyzed too with ```.exon``` or ```.near_tss/.near_tes``` option for ```bismarkplot.Genome```

```python
import src.bismarkplot.Genome

exons = [
bismarkplot.Genome.from_gff(file).exon(min_length=100) for file in genome_filenames
src.bismarkplot.genome.Genome.from_gff(file).exon(min_length=100) for file in genome_filenames
]
metagenes = []
for report, exon in zip(reports_filenames, exons):
metagene = bismarkplot.Metagene(report, genome = exon,
upstream_windows = 0, # !!!
downstream_windows = 0, # !!!
metagene = bismarkplot.Metagene(report, genome=exon,
upstream_windows=0, # !!!
downstream_windows=0, # !!!
...)
metagenes.append(metagene)
# OR
tss = [
bismarkplot.Genome.from_gff(file).near_tss(min_length = 2000, flank_length = 2000) for file in genome_filenames
src.bismarkplot.genome.Genome.from_gff(file).near_tss(min_length=2000, flank_length=2000) for file in
genome_filenames
]
metagenes = []
for report, t in zip(reports_filenames, tss):
metagene = bismarkplot.Metagene(report, genome = t,
upstream_windows = 1000,# same number of windows
gene_windows = 1000, # same number of windows
downstream_windows = 0, # !!!
metagene = bismarkplot.Metagene(report, genome=t,
upstream_windows=1000, # same number of windows
gene_windows=1000, # same number of windows
downstream_windows=0, # !!!
...)
metagenes.append(metagene)
```
Expand All @@ -281,29 +363,31 @@ TSS output:
BismarkPlot allows user to visualize chromosome methylation levels across full genome

```python
import src.bismarkplot.ChrLevels
import bismarkplot
chr = bismarkplot.ChrLevels.from_file(

chr = src.bismarkplot.levels.ChrLevels.from_file(
"path/to/CX_report.txt",
window_length=10**5, # window length in bp
batch_size=10**7,
chr_min_length = 10**6, # minimum chr length in bp
window_length=10 ** 5, # window length in bp
batch_size=10 ** 7,
chr_min_length=10 ** 6, # minimum chr length in bp
)
fig, axes = plt.subplots()

for context in ["CG", "CHG", "CHH"]:
chr.filter(strand="+", context=context).draw(
(fig, axes), # to plot contexts on same axes
smooth=10, # window number for smoothing
label=context # labels for lines
)
chr.filter(strand="+", context=context).draw_mpl(
(fig, axes), # to plot contexts on same axes
smooth=10, # window number for smoothing
label=context # labels for lines
)

fig.savefig(f"chrom.pdf", dpi = 200)
fig.savefig(f"chrom.pdf", dpi=200)
```

Output for Arabidopsis t.:
Output for _Arabidopsis thaliana_:

<img src="https://user-images.githubusercontent.com/43905117/274563188-6efc5b71-9c83-4fe0-8b5a-767db6e1acb4.png">

Output for Brachypodium d.:
Output for _Brachypodium distachyon_:

<img src="https://user-images.githubusercontent.com/43905117/274563210-4f5dc20a-4ab3-4e52-8263-6ebe7b0623d5.png">
<img src="https://user-images.githubusercontent.com/43905117/274563210-4f5dc20a-4ab3-4e52-8263-6ebe7b0623d5.png">
14 changes: 14 additions & 0 deletions docs/_binom.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Binomial
========

Methods for calculating P-value for cytosine residues or genomic regions.

.. currentmodule:: bismarkplot

.. autosummary::
:nosignatures:
:toctree: _Binom
:template: class.rst

RegionStat
BinomialData
Loading

0 comments on commit f587a63

Please sign in to comment.