Skip to content

fulopjoz/CG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

# CG Cancer Genomics Data Analysis Exercise

First index the Human reference genome using BWA

bwa index GCA_000001405.28_GRCh38.p13_genomic.fna.gz

Align and create sam file

bwa mem GCA_000001405.28_GRCh38.p13_genomic.fna.gz tu.r1.fq.gz tu.r2.fq.gz > tumor.sam
bwa mem GCA_000001405.28_GRCh38.p13_genomic.fna.gz tu.r1.fq.gz tu.r2.fq.gz > wt.sam

Convert sam to bam

samtools view -O BAM -o tumor.bam tumor.sam
samtools view -O BAM -o wt.bam wt.sam

Sort and index the BAM file

samtools sort -T temp -O bam -o tumor.sorted.bam tumor.bam
samtools sort -T temp -O bam -o wt.sorted.bam wt.bam

Index sorted bam files

samtools index tumor.sorted.bam
samtools index wt.sorted.bam

Remove duplicates from PCR

samtools rmdup -r -S tumor.sorted.subset.bam tumor.deduplicate.bam
samtools rmdup -r -S wt.sorted.subset.bam wt.deduplicate.bam

Identify the depth at each locus from a bam file

samtools depth tumor.deduplicated.bam > tumor.deduplicated.coverage
samtools depth wt.deduplicated.bam > wt.deduplicated.coverage

Extract just chromosomeX = CM000685

grep "CM000685" tumor.deduplicated.coverage > tumor.chrx.coverage
grep "CM000685" wt.deduplicated.coverage > wt.chrx.coverage

Subset to the region of interest

sed -n '/20000000/,/40000000/p' tumor.chrx.coverage > tumor.extract.new
sed -n '/20000000/,/40000000/p' wt.chrx.coverage > wt.extract.new

Keep last two columns

sed 's/CM000685.2//' wt.extract.new > wt.extract
sed 's/CM000685.2//' tumor.extract.new > tumor.extract

Run

python3 rd_plot.py

About

Cancer Genomics Data Analysis Exercise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages