Skip to content

Short read mapping

Ivy edited this page Sep 15, 2021 · 24 revisions

Short read mapping

Algorithm overview.

Long read pipeline algorithm overview

Usage

usage: ecc_finder.py map-sr <reference.idx> <query.fq1> <query.fq2>

A tool to detect eccDNA loci using Illumina read sequencing

positional arguments:
  <reference.idx>       index file of reference genome
  <query.fq1>           query fastq forward file (uncompressed or bgzipped)
  <query.fq2>           query fastq reverse file (uncompressed or bgzipped)

optional arguments:
  -h, --help            show this help message and exit

map options:
  -t INT                number of CPU threads for mapping mode
  --aligner PATH        short read aligner executable ('bwa', 'bowtie2','segemehl.x','minimap2') [bwa]
  --bwa-params STR      space delimted bwa parameters ['mem']
  --bowtie2-params STR  space delimted bowtie2 parameters ['--end-to-end -k 1 --sensitive']
  --segemehl-params STR
                        space delimted segemehl parameters ['-S -A 95 -W 95 -U 24 -Z 25']
  --minimap2-params STR
                        space delimted minimap2 parameters ['-ax sr']
  -g STR                reference genome size larger than 4Gb [yes]

peak-calling options:
  -l INT                minimum length of a peak [200]
  -d INT                maximum distance between signif. sites [1000]
  -p FLT                maximum p-value [0.05]

validation options:
  -r <reference.fa>     reference genome fasta (uncompressed or bgzipped)
  --min-read INT        filter locus by unique mapped read number [3]
  --min-cov FLT         filter locus at regions by raw read coverage (# aligned bases / total bases)

output options:
  -o PATH               output directory [./eccFinder_output]
  -w                    overwrite intermediate files
  -x X                  add prefix to output [ecc.ill]

Note that, you can choose your favorite short-read aligner (bwa, bowtie2, segemehl or minimap2), and the default is bwa.

False positive loci

False positive loci example

Figure X. Detection of false positive loci from ONSEN/ATCOPIA78 members in the heat-stressed Arabidopsis (HS1, HS2). RR&LL:  Illumina sequenced read pairs align in the same orientation with respect to reference. RL: Illumina sequenced read pairs align in an outward-facing order with respect to reference that indicate discordant reads.

The only one split read pair (coloured by blue) did not share the same loci with discordant reads in sample HS1, and there is no discordant reads in sample HS2, indicating false positive loci.

Output

All output is in eccFinder_output, or whichever directory -o specifies.

ecc.sr.fasta

The eccDNA locus in FASTA format.

ecc.sr.csv

The eccDNA locus in csv format.

Col Type Description
1 string Reference sequence name
2 int Reference start on original strand
3 int Reference end on original strand
4 int Split read number at the locus
5 int Discordant read number at the locus
6 int Read coverage at the locus
7 int EccDNA sequence length

ecc_finder.png

The Size distribution of detected eccDNA in png format.

Size_distribution
Clone this wiki locally