Skip to content

Short read mapping

Ivy edited this page Sep 8, 2021 · 24 revisions

Short read mapping

Algorithm overview.

Long read pipeline algorithm overview

Usage

usage: ecc_finder.py map-sr <reference.idx> <query.fq>

A tool to detect eccDNA loci using Illumina read sequencing

positional arguments:
  <reference.idx>       index file of reference genome
  <query.fq1>           query fastq forward file (uncompressed or bgzipped)
  <query.fq2>           query fastq reverse file (uncompressed or bgzipped)

optional arguments:
  -h, --help            show this help message and exit

map options:
  -t INT                number of CPU threads for mapping mode
  --aligner PATH        short read aligner executable ('bwa', 'bowtie2','segemehl','minimap2') [bwa]
  --bwa-params STR      space delimted bwa parameters [' mem ']
  --bowtie2-params STR  space delimted bowtie2 parameters ['--end-to-end -k 1 --sensitive']
  --segemehl-params STR
                        space delimted segemehl parameters ['-S -A 95 -W 95 -U 24 -Z 25 -t 8']
  --minimap2-params STR
                        space delimted minimap2 parameters [' -ax sr ']
  -g STR                reference genome size larger than 4Gb [yes]

peak-calling options:
  -l INT                minimum length of a peak [200]
  -d INT                maximum distance between signif. sites [1000]
  -p FLT                maximum p-value [0.05]

validation options:
  -r <reference.fa>     reference genome fasta (uncompressed or bgzipped)
  --min-read INT        filter locus by unique mapped read number [3]
  --min-cov FLT         filter locus at regions by raw read coverage (# aligned bases / total bases)

output options:
  -o PATH               output directory [./eccFinder_output]
  -w                    overwrite intermediate files
  -x X                  add prefix to output [ecc.ill]

Output

All output is in eccFinder_output, or whichever directory -o specifies.

ecc.sr.fasta

The eccDNA locus in FASTA format.

ecc.sr.csv

The eccDNA locus in csv format.

Col Type Description
1 string Reference sequence name
2 int Reference start on original strand
3 int Reference end on original strand
4 int Circular read number at the locus
5 int Repeat units of all circular reads
6 int Read coverage at the locus
7 int EccDNA sequence length

ecc_finder.png

The Size distribution of detected eccDNA in png format.

Size_distribution
Clone this wiki locally