Skip to content

Short read mapping

Ivy edited this page Oct 21, 2021 · 24 revisions

ecc_finder Version: v1.0.0

To identify eccDNA loci by mapping to a reference genome.

Algorithm overview.

Long read pipeline algorithm overview

Usage

Note that because the output from bwa index, it will create 5 files with appendix: reference.fa.bwt, reference.fa.amb, reference.fa.ann, reference.fa.pac and reference.fa.sa

So the <reference.fa> here represents the whole index output for bwa aligner, and "-r reference.fa" represents the reference fasta file

usage: 

python ecc_finder.py map-sr <reference.fa> <query.fq1> <query.fq2> -r reference.fa (option)

or to speed up:

python ecc_finder.py map-sr <reference.sr.idx> <query.fq1> <query.fq2> -r reference.fa --aligner minimap2 (option)


A tool to detect eccDNA loci using Illumina read sequencing

positional arguments:
  <reference.idx>       index file of reference genome
  <query.fq1>           query fastq forward file (uncompressed or bgzipped)
  <query.fq2>           query fastq reverse file (uncompressed or bgzipped)

optional arguments:
  -h, --help            show this help message and exit

map options:
  -t INT                number of CPU threads for mapping mode
  --aligner PATH        short read aligner executable ('bwa', 'bowtie2','segemehl.x','minimap2') [bwa]
  --bwa-params STR      space delimted bwa parameters ['mem']
  --bowtie2-params STR  space delimted bowtie2 parameters ['--end-to-end -k 1 --sensitive']
  --segemehl-params STR
                        space delimted segemehl parameters ['-S -A 95 -W 95 -U 24 -Z 25']
  --minimap2-params STR
                        space delimted minimap2 parameters ['-ax sr']
  -g STR                reference genome size larger than 4Gb [yes]

peak-calling options:
  -l INT                minimum length of a peak [200]
  -d INT                maximum distance between signif. sites [1000]
  -p FLT                maximum p-value [0.05]

validation options:
  -r <reference.fa>     reference genome fasta (uncompressed or bgzipped)
  --min-read INT        filter locus by unique mapped read number [3]
  --min-cov FLT         filter locus at regions by raw read coverage (# aligned bases / total bases)

output options:
  -o PATH               output directory [./eccFinder_output]
  -w                    overwrite intermediate files
  -x X                  add prefix to output [ecc.ill]

Note that, you can choose your favorite short-read aligner (bwa, bowtie2, segemehl or minimap2), and the default is bwa.

False positive loci

For a given family of Long Terminal Repeats (LTR) retrotransposons producing eccDNA for instance, all copies belonging to the same family and sharing the same LTR sequences will produce alignments of split and discordant reads at their boundaries. Only the copies producing bona fide eccDNA will thus display an even distribution of split and discordant reads throughout their internal region.

Examples of false positive loci from ONSEN/ATCOPIA78 members in the heat-stressed Arabidopsis (HS1, HS2). RR&LL:  Illumina sequenced read pairs align in the same orientation with respect to reference. RL: Illumina sequenced read pairs align in an outward-facing order with respect to reference that indicate discordant reads.

The only one split read pair (coloured by blue) did not share the same loci with discordant reads in sample HS1, and there is no discordant reads in sample HS2, indicating false positive loci.

Output

All output is in eccFinder_output, or whichever directory -o specifies.

ecc.sr.fasta

The eccDNA locus in FASTA format.

ecc.sr.csv

The eccDNA locus in csv format.

Col Type Description
1 string Reference sequence name
2 int Reference start on original strand
3 int Reference end on original strand
4 int Split read number at the locus
5 int Discordant read number at the locus
6 int EccDNA sequence length

ecc_finder.png

The Size distribution of detected eccDNA in png format.

Size_distribution

Video

Run Example2: watch the video Short-read-mapping_Video_example using the Arabidopsis eccDNA sequencing subsample in the folder test_samples.

Clone this wiki locally