-
Notifications
You must be signed in to change notification settings - Fork 5
Short read mapping
usage: ecc_finder.py map-sr <reference.idx> <query.fq1> <query.fq2>
A tool to detect eccDNA loci using Illumina read sequencing
positional arguments:
<reference.idx> index file of reference genome
<query.fq1> query fastq forward file (uncompressed or bgzipped)
<query.fq2> query fastq reverse file (uncompressed or bgzipped)
optional arguments:
-h, --help show this help message and exit
map options:
-t INT number of CPU threads for mapping mode
--aligner PATH short read aligner executable ('bwa', 'bowtie2','segemehl.x','minimap2') [bwa]
--bwa-params STR space delimted bwa parameters ['mem']
--bowtie2-params STR space delimted bowtie2 parameters ['--end-to-end -k 1 --sensitive']
--segemehl-params STR
space delimted segemehl parameters ['-S -A 95 -W 95 -U 24 -Z 25']
--minimap2-params STR
space delimted minimap2 parameters ['-ax sr']
-g STR reference genome size larger than 4Gb [yes]
peak-calling options:
-l INT minimum length of a peak [200]
-d INT maximum distance between signif. sites [1000]
-p FLT maximum p-value [0.05]
validation options:
-r <reference.fa> reference genome fasta (uncompressed or bgzipped)
--min-read INT filter locus by unique mapped read number [3]
--min-cov FLT filter locus at regions by raw read coverage (# aligned bases / total bases)
output options:
-o PATH output directory [./eccFinder_output]
-w overwrite intermediate files
-x X add prefix to output [ecc.ill]
Note that, you can choose your favorite short-read aligner (bwa, bowtie2, segemehl or minimap2), and the default is bwa.
Figure X. Detection of false positive loci from ONSEN/ATCOPIA78 members in the heat-stressed Arabidopsis (HS1, HS2). RR&LL: Illumina sequenced read pairs align in the same orientation with respect to reference. RL: Illumina sequenced read pairs align in an outward-facing order with respect to reference that indicate discordant reads.
The only one split read pair (coloured by blue) did not share the same loci with discordant reads in sample HS1, and there is no discordant reads in sample HS2, indicating false positive loci.
All output is in eccFinder_output
, or whichever directory -o
specifies.
ecc.sr.fasta
The eccDNA locus in FASTA format.
ecc.sr.csv
The eccDNA locus in csv format.
Col | Type | Description |
---|---|---|
1 | string | Reference sequence name |
2 | int | Reference start on original strand |
3 | int | Reference end on original strand |
4 | int | Split read number at the locus |
5 | int | Discordant read number at the locus |
6 | int | Read coverage at the locus |
7 | int | EccDNA sequence length |
ecc_finder.png
The Size distribution of detected eccDNA in png format.