Skip to content

Commit

Permalink
Improve docs
Browse files Browse the repository at this point in the history
  • Loading branch information
grst committed Feb 7, 2021
1 parent 8209739 commit e5b24eb
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 23 deletions.
14 changes: 7 additions & 7 deletions docs/infercnv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The inferCNV method
===================

This methodology in this package is essentially a python reimplementation of
Essentially, this package is a Python reimplementation of
`infercnv <https://github.com/broadinstitute/inferCNV/>`_. It mostly follows the computation steps
outlined `here <https://github.com/broadinstitute/inferCNV/wiki/Running-InferCNV>`_,
with minor modifications. The computation steps are outlined below.
Expand All @@ -20,24 +20,24 @@ The function parameters are documented at :func:`infercnvpy.tl.infercnv`.
multiple categories are available (i.e. multiple values are specified to
`reference_cat`), the log fold change is "bounded":

* compute the mean gene expression for each category separately
* Compute the mean gene expression for each category separately.
* Values that are within the minimum and the maximum of the mean of all
references, receive a log fold change of 0, since they are not considered
different from the background.
* From values smaller than the minimum of the mean of all references, subtract that minimum.
* From values larger than the maximum of the mean of all references, subtract that maximum.

This procedure avoids calling false positive CNV due to cell-type specific
expression of clustered gene regions (e.g. Immunoglobulin or HLA genes in different
This procedure avoids calling false positive CNV regions due to cell-type specific
expression of clustered gene regions (e.g. Immunoglobulin- or HLA genes in different
immune cell types).
2. Clip the fold changes at `-lfc_cap` and `+lfc_cap`.
3. Smooth the gene expression by genomic position. Computes the average over a
running window of length `window_size`. Compute only every nth window
to save time & space, where n = `step`.
4. Center the smoothed gene expression by cell, but subtracting the
calculating and subtracting the median for each cell.
4. Center the smoothed gene expression by cell, by subtracting the median of each cell
from each cell.
5. Perform noise filtering. Values `< dynamic_theshold * STDDEV` are set to 0,
where STDDEV is the standard deviation of the smoothed gene expression
where `STDDEV` is the standard deviation of the smoothed gene expression
6. Smooth the final result using a median filter.

.. _input-data:
Expand Down
7 changes: 3 additions & 4 deletions docs/tutorials/reproduce_infercnv.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,14 @@ jupyter:
jupytext_version: 1.5.0.rc1
---

# Reproduce the heatmap from inverCNV
# Reproduce the heatmap from inferCNV

This document demonstrates how the [example heatmap](https://github.com/broadinstitute/inferCNV/wiki#demo-example-figure) from the original
This document demonstrates to reproduce how the [example heatmap](https://github.com/broadinstitute/inferCNV/wiki#demo-example-figure) from the original
R inverCNV implementation. It is based on a small, 183-cell example dataset of malignant and non-malignant cells from Oligodendroglioma derived from [Tirosh et al. (2016)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5465819/).

```python
import infercnvpy as cnv
import scanpy as sc
import numpy as np
```

## Prepare and inspect dataset
Expand Down Expand Up @@ -48,7 +47,7 @@ sc.pl.umap(adata, color="cell_type")
In this case we know which cells are non-malignant. For best results, it is recommended to use
the non-malignant cells as a background. We can provide this information using `reference_key` and `reference_cat`.

In order to reproduce the results as exactely as possible, we use a `window_size` of 50, a `step` of 1.
In order to reproduce the results as exactely as possible, we use a `window_size` of 100 and a `step` of 1.

```python
%%time
Expand Down
42 changes: 30 additions & 12 deletions docs/tutorials/tutorial_3k.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,20 @@ sc.logging.print_header()
must be normalized and log-transformed. For more information, see
:ref:`input-data`.

Also, the genomic positions need to be stored in `adata.var`. The
columns `chromosome`, `start`, and `end` hold the chromosome and
the start and end positions on that chromosome for each gene,
respectively.

Infercnvpy provides the :func:`infercnvpy.io.genomic_position_from_gtf` function
to read these information from a GTF file and add them to `adata.var`.

The example dataset is already appropriately preprocessed.
<!-- #endraw -->

```python
adata = cnv.datasets.maynard2020_3k()
adata.var.loc[:, ["ensg", "chromosome", "start", "end"]].head()
```

Let's first inspect the UMAP plot based on the transcriptomics data:
Expand All @@ -69,6 +78,9 @@ region to a reference. The original inferCNV method uses a window size of 100,
but larger window sizes can make sense, depending on the number of
genes in your dataset.

:func:`~infercnvpy.tl.infercnv` adds a `cell x genomic_region` matrix to
`adata.obsm["X_cnv"]`.

For more information about the method check out :ref:`infercnv-method`.

.. note::
Expand Down Expand Up @@ -135,6 +147,7 @@ Based on these clusters, we can annotate tumor and normal cells.

.. autosummary::
:toctree: ../generated
:noindex:

infercnvpy.tl.pca
infercnvpy.pp.neighbors
Expand All @@ -149,10 +162,10 @@ cnv.pp.neighbors(adata)
cnv.tl.leiden(adata)
```

After running leiden clustering, we plot the chromosome heatmap
After running leiden clustering, we can plot the chromosome heatmap
by CNV clusters. We can observe that, as opposted to the clusters
at the bottom, the clusters at the top have essentially no differentially expressed genomic regions.
The differentially expressed regions are likely due to copy number variation and those
The differentially expressed regions are likely due to copy number variation and the respective
clusters likely represent tumor cells.

```python
Expand All @@ -161,16 +174,12 @@ cnv.pl.chromosome_heatmap(adata, groupby="cnv_leiden", dendrogram=True)

### UMAP plot of CNV profiles


<!-- #raw raw_mimetype="text/restructuredtext" -->
We can visualize the same clusters as a UMAP plot. Additionally,
we developed a summary score that quantifies the amount of copy
:func:`infercnvpy.tl.cnv_score` computes a summary score that quantifies the amount of copy
number variation per cluster. It is simply defined as the
mean of the absolute values of the CNV matrix for each cluster.

.. autosummary::
:toctree: ../generated

infercnvpy.tl.cnv_score
<!-- #endraw -->

```python
cnv.tl.umap(adata)
Expand Down Expand Up @@ -202,10 +211,19 @@ Again, we can see that there are subclusters of epithelial cells that belong
to a distinct CNV cluster, and that these clusters tend to have the
highest CNV score.

```python
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(
2, 2, figsize=(12, 11), gridspec_kw=dict(wspace=0.5)
)
ax4.axis("off")
sc.pl.umap(adata, color="cnv_leiden", ax=ax1, show=False)
sc.pl.umap(adata, color="cnv_score", ax=ax2, show=False)
sc.pl.umap(adata, color="cell_type", ax=ax3)
```

### Classifying tumor cells

Based on these observations, we can now assign cell as either "tumor" or "normal".
Based on these observations, we can now assign cell to either "tumor" or "normal".
To this end, we add a new column `cnv_status` to `adata.obs`.

```python
Expand All @@ -216,12 +234,12 @@ adata.obs.loc[
```

```python
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 5))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), gridspec_kw=dict(wspace=0.5))
cnv.pl.umap(adata, color="cnv_status", ax=ax1, show=False)
sc.pl.umap(adata, color="cnv_status", ax=ax2)
```

Now, we can also plot the CNV heatmap for tumor and normal cells separately:
Now, we can plot the CNV heatmap for tumor and normal cells separately:

```python
cnv.pl.chromosome_heatmap(adata[adata.obs["cnv_status"] == "tumor", :])
Expand Down

0 comments on commit e5b24eb

Please sign in to comment.