Skip to content

Long read assembly

Ivy edited this page Oct 6, 2021 · 12 revisions

ecc_finder Version: v1.0.0

ecc_finder identifies eccDNA loci using short and long reads.

Long read pipeline algorithm overview

Usage

usage: python ecc_finder.py asm-ont <query.fq> (option)

A tool to detect eccDNA loci using ONT sequencing

positional arguments:
  <query.fq>         query fastq file (uncompressed or bgzipped)

optional arguments:
  -h, --help         show this help message and exit

asm options:
  -t INT             number of CPU threads for asmping mode
  --five-prime STR   5' adapter sequence (sense strand) [NULL]
  --three-prime STR  3' adapter sequence (anti-sense strand) [NULL]

consensus options:
  -n INT             minimum copy number of tandem repeat in a long read [2]
  -e FLT             maximum allowed divergence rate between two consecutive repeats [0.25]
  -s INT             minimum period size of tandem repeat (>=2) [30]
  -c INT             minimum sequence identity for clustering [0.8]
  -l INT             minimum length of throw_away_sequences [200]
  -m INT             memory limit (in MB) for CD-hit clustering program [800]

output options:
  -o PATH            output directory [./eccFinder_asm_output]
  -w                 overwrite intermediate files
  -x X               add prefix to output [ecc.asm.ont]

** The query files are required **

Output

All output is in eccFinder_asm_output, or whichever directory -o specifies.

Overview

Col Type Description
1 file Assembly FASTA file of the eccDNA sequence: ecc.asm.ont.fasta
2 file Consensus FASTA file of the reads with tandem repeat pattern: ecc.asm.ont.cons.fa
3 files Cluster file of the tandem repeat clustering: ecc.asm.ont.cluster & ecc.asm.ont.clstr

Video

Run Example3: You can watch the video Long-read-assembly_Video_example using the Arabidopsis eccDNA sequencing subsample in the folder test_samples.

Clone this wiki locally