Updated on 2022-10-25

Introduction:

This tutorial use a small example to go through ReDeeM-V pipeline, which starts with mtDNA library fastq files and end with consensus variant calling

Inputs: R1.fq (150nt), R2.fq (150nt), i5.fq (150nt)
Key Output: QCplot.png (QC plot assesing mtDNA mapping as well as library complexity, etc)
Key Output: QualifiedTotalCts, RawGenotypes.Total.StrandBalance, RawGenotypes.VerySensitive.StrandBalance, RawGenotypes.Sensitive.StrandBalance, RawGenotypes.Specific.StrandBalance

Prerequisite

Download the package

git clone https://github.com/chenweng1991/REDEEM-V.git

Software dependency

python
bowtie2
cutadapt (This is cutadapt 3.7 with Python 3.6.9)
samtools
bedtools

python package dependency

sys
gzip
pysam
click
os
pickle
numpy
pandas
progress.bar
collections

R package dependency

ggplot2
dplyr
gridExtra
plyr
labeling

Main pipeline

Assign paths

REDEEM_V=ThePathToREDEEM-V #The loacation where the REDEEM-V is downloaded to (eg. /lab/solexa_weissman/cweng/Packages/REDEEM-V/)
MyMultiome=$REDEEM_V/MyMultiome/
mitoConsensus=$REDEEM_V/mitoConsensus/

Step 0: Make mitoMask bowtie2 index

Download hg38 fasta file and mitochondrial blacklist file

mkdir Genome
cd Genome
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz 
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
wget https://raw.githubusercontent.com/caleblareau/mitoblacklist/master/combinedBlacklist/hg38.full.blacklist.bed
gunzip hg38.fa.gz

Make hg38 mito masked bowtie2 index

bedtools maskfasta  -fi hg38.fa -bed hg38.full.blacklist.bed -fo hg38.mitoMask.fa
bowtie2-build hg38.mitoMask.fa hg38.mitoMask  ## This takes time and will generate following, if on WI server ln -s /lab/solexa_weissman/cweng/Genomes/GRCH38/GRCH38_Bowtie2_MitoMask/*.bt2 ./

Step1: Get sequencing result and QC

For this tutorial, Link the example fq files

ln -s $REDEEM_V/source/Example.R1.fq.gz ./
ln -s $REDEEM_V/source/Example.R2.fq.gz ./
ln -s $REDEEM_V/source/Example.i5.fq.gz ./

Fastqc and multiqc

mkdir fastqc
fastqc -o fastqc Example.R1.fq.gz Example.i5.fq.gz Example.R2.fq.gz
multiqc fastqc

Step2: mapping and basic QC

$MyMultiome/MultiomeATAC_mito.sh -n Tutorial_Mito -1 Example.R1.fq.gz -2 Example.R2.fq.gz -i Example.i5.fq.gz -c 0 -t 8 -m $MyMultiome -b Genome/hg38.mitoMask  # add -q for quick, which skip the QC steps

After this step a QCplot is generated

Step3: Parse the Cellranger qualified cell barcode

In a ReDeeM experiment, the ATAC and RNA data is analyzed by Cellranger arc. The barcodes.tsv.gz from that will be parsed here. Becasue the barcode in barcodes.tsv.gz is RNA barcode, we need to translate into ATAC barcode (via RNAbc2ATAC.R) to be able to match mtDNA data

ln -s $REDEEM_V/source/barcodes.tsv.gz ./
Rscript $MyMultiome/Helpers/RNAbc2ATAC.R $REDEEM_V barcodes.tsv.gz Tutorial_atac.barcodes.tsv

Step4, prepare for variant calling

Threads=24
WD=`pwd`/Out_mitoConsensus
python $mitoConsensus/Preprocess.py -i Tutorial_Mito.uniqmapped.bam  -c $Threads -b Tutorial_atac.barcodes.tsv -o $WD -g rCRS -bt BC -sd $mitoConsensus

Step5, run consensus variant calling

Run variant calling in parallel

echo $WD
for ((i=1;i<=$Threads;i++))
do
bsub python $mitoConsensus/mitoConsensus.py  barcodes.$i $WD BC 30
done

Concat together

$mitoConsensus/Finalize.sh $WD $Threads $mitoConsensus

Expected result

Location: The results are saved in $WD/final
Major 5 files:
- QualifiedTotalCts,
- RawGenotypes.Total.StrandBalance(Least stringent),
- RawGenotypes.VerySensitive.StrandBalance(Less stringent),
- RawGenotypes.Sensitive.StrandBalance(Stringent),
- RawGenotypes.Specific.StrandBalance (Most Stringent)
QualifiedTotalCts is a table with 6 columnes that show mtDNA coverage per position per cell

Cellbarcode	coordinates on mt genome	# unique frag(total)	# unique frag(less stringent)	# unique frag(stringent)	# unique frag(very stringent)

RawGenotypes is a table with 14 columnes that show the consensus variant calling. Each row is a molecule with a potential variant

MoleculeID	CellBC	Pos	Variant	V	Ref	FamSize	V-counts	CSS	DB_Cts	SG_Cts	Is+	Is-	TotalDepth

MoleculeID: Cellbarcode+start+end which is the identifier to define a molecule
CellBC: Cell barcode
Pos: The coordinate of the variant
Variant: A description of the variant
V: The variant base called on Pos
Ref: The reference base on Pos
FamSize: The consensus family size, or the total number of PCR copies for the given molecule
V-count: Number of PCR copies that support the variant
CSS: Consensus score, which is the proportion of PCR copies that support the variant
DB_Cts: Number of double cover copies, which are positions that sequenced by both Read1 and Read2
SG_Cts: Number of single cover copies, which are positions that sequenced by only Read1 or Read2
Is+: If the variant is discovered on plus strand
Is- : If the variant is discovered on minus strand
TotalDepth: On this given position, total number of unique fragment in the given cell

Strand biased variant has been removed
QualifiedTotalCts and RawGenotypes.* are the inputs for REDEEM-R for downstream mutation filtering and phylogenetic tree reconstruction

(Optional) Cell Hashing

First, generate barcodes.tsv.16bp from the Cell Ranger result

zcat ../CellRanger/OBC/outs/filtered_feature_bc_matrix/barcodes.tsv.gz | awk '{print substr($1,1,16)}' > barcodes.tsv.16bp

Second, Create HashDic.csv like below

## Create a HashDic.csv
#TGTCTTTCCTGCCAG,SBM1037
#CTCCTCTGCAATTAC,SBM1253
#CAGTAGTCACGGTCA,SBM1324
#ATTGACCCGCGTTAG,SBM1218

Third run the script below, which will generate a plot.pdf and a folder pymulti that contains a result table

python3 $REDEEMV/JohnnyCellHash/RunCellHash.py $REDEEMV OBC OBC_Hash_S1_L001_R1_001.fastq.gz.trim OBC_Hash_S1_L001_R3_001.fastq.gz.trim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial_20221025.md

Tutorial_20221025.md

Introduction:

Prerequisite

Download the package

Software dependency

python package dependency

R package dependency

Main pipeline

Assign paths

Step 0: Make mitoMask bowtie2 index

Step1: Get sequencing result and QC

Step2: mapping and basic QC

Step3: Parse the Cellranger qualified cell barcode

Step4, prepare for variant calling

Step5, run consensus variant calling

Expected result

(Optional) Cell Hashing

Files

Tutorial_20221025.md

Latest commit

History

Tutorial_20221025.md

File metadata and controls

Introduction:

Prerequisite

Download the package

Software dependency

python package dependency

R package dependency

Main pipeline

Assign paths

Step 0: Make mitoMask bowtie2 index

Step1: Get sequencing result and QC

Step2: mapping and basic QC

Step3: Parse the Cellranger qualified cell barcode

Step4, prepare for variant calling

Step5, run consensus variant calling

Expected result

(Optional) Cell Hashing