Skip to content

a tool for predicting mitochondrial DNA deletions using soft-clipping

License

Notifications You must be signed in to change notification settings

dooguypapua/eKLIPse

Repository files navigation

eKLIPse is no longer maintained.
A unpublished version (v2.1) with duplication integration
is available in this repository.
Duplication are defined using MitoSAlt approach (Basu et al. 2020).
Outputs was also improved.

eklipse logo

eKLIPse is a sensitive and specific tool allowing the detection and quantification of large mtDNA rearrangements.
Based on soft-clipping it provides the precise breakpoint positions and the cumulated percentage of mtDNA rearrangements at a given gene location with a high detection sensitivity.
Both single and paired-end (mtDNA, WES, WGS) data are accepted.
eKLIPse requires two types of input, the BAM or SAM alignment files (with header) and the corresponding mitochondrial genome (GenBank format).
Alignment must contains soft-clipping information (see your aligner options).
eKLIPSE is available either as a script to be integrated in a pipeline, or as user friendly graphical interface.

- Like others CNV tools, eKLIPse performance will depend on your sequencing and mapping steps.

Graphical User Interface (Qt)

Windows Deployment (portable)

  • download lastest version 080620 here.
  • unzip ZIP file.
  • launch 'eKLIPse.exe'
- Space not allowed in executable and input/output path

Linux Installation

  • install required tools (see Requirements section)
  • download lastest version here.
  • unzip Qt_eKLIPse_unix_v1-0.zip
  • cd Qt_eKLIPse_unix_v1-0.zip
  • chmod a+x eKLIPse
  • ./eKLIPse

Running

Start

eklipse GUI{ width=30% }
To start analysis, simply click "START".
(you can change the colors by clicking on the bottom right colors)

Launch Analysis

eklipse GUI
1 - To select your alignment files, click "ADD". If required you can change alignment title by selecting corresponding cell.
2 - Select your reference genome. If you choose "Other", browse to your own Genbank file by clicking on the folder icon.
3 - To change "results directory", click on the folder icon.
4 - To modify "Advanced parameters" click on the expand icon. Please refers to "Parameters" section for further information.
5 - Launch analysis by clicking "START"

Analysis in progress

eklipse GUI
eKLIPse analysis detailed progress can be followed on this window.

Results

eklipse GUI
Once the analysis is complete, the program automatically opens the result folder.

Testing

Two reduced alignment files are provided with the archive file.
Click "TEST" on the "Launch Analysis" windows before clicking "START".


Command Line Interface

Docker

A docker image is also available. Follow building instruction here

Linux

Requirements

Please install the following modules & tools:

Testing
python eKLIPse.py --test

(*add "-samtools", "-blastn", "-makeblastdb" and "-circos" options if not in $PATH)
Running
python eKLIPse.py -in <INPUT file path> -ref <GBK file path> [OPTIONS]

[OPTIONS]
-out          <str>  : Output directory path                  [current]
-tmp          <str>  : Temporary directory path               [/tmp]
-scsize       <int>  : Soft-clipping minimal length           [25]
-mapsize      <int>  : Upstream mapping length                [20]
-downcov      <int>  : Downsampling read number               [500000] (0=disable)
-minq         <int>  : Read quality threshold                 [20]
-minlen       <int>  : Read length threshold                  [100]
-shift        <int>  : Breakpoint sliding-window size         [5]
-minblast     <int>  : Minimal number of BLAST per breakpoint [1]
-bilateral    <bool> : Filter unidirectional BLAST            [True]
-mitosize     <int>  : Remove deleted mtDNA less than         [1000]
-id           <int>  : BLAST %identity threshold              [80]
-cov          <int>  : BLAST %coverage threshold              [70]
-gapopen      <int>  : BLAST cost to open a gap               [0:proton, 5:illumina]
-gapext       <int>  : BLAST cost to extend a gap             [2]
-thread       <int>  : Thread number                          [2]
-samtools     <str>  : samtools bin path                      [$PATH]
-blastn       <str>  : BLASTN bin path                        [$PATH]
-makeblastdb  <str>  : makeblastdb bin path                   [$PATH]
-circos       <str>  : circos bin path                        [$PATH]
--test               : eKLIPse test
--nocolor            : Disable output colors

Parameters

Input file (-in)

eKLIPse accepts alignments in BAM or SAM format (require header) for both single and paired-end sequencing data.
The input file is a simple tabulated text file as follow:

path_bamtitle1
path_bam2title2
##
mtDNA reference (-ref)

eKLIPse accepts any mtDNA reference genome in Genbank format.
rCRS (NC_012920.1.gb), CRS (J01415.2.gb) and Mus musculus (NC_005089.1.gb) are provided in "/data"

Downsampling (-downcov)

In order to reduce the execution time, a downsampling option is available.
For singles deletions with low mutant load or multiple deletions, we advise to not downsample "-downcov 0".
The obtained read number should match to a sufficient mitochondrial genome coverage.

Sequencing & Alignment (-minq / -minlen)

According to your sequencing technology and library, you can adjust the minimum read length value (-minlen).
You can adjust minimum read quality (-minq), for example to consider multiple hits for a same read which reduce the minq.

Soft-clipping (-minq / -minlen)

For short read data, we advise to reduce minimal soft-clipping length (-scsize) and upstream mapping length (-mapsize).
For example, with 100bp reads, you could use "-scsize 15" and "-mapsize 10".
Breakpoint sliding-window size could be modify if you expect a high number of homopolymers.

BLASTn (-id / -cov / -gapopen / -gapext )

BLASTn thresholds are mostly sequencing technology dependent.
Then according to your sequencing quality you could increase or decrease identity and coverage thresholds (-id / -cov).
Illumina is known to generate fewer errors and can therefore be more stringent on gap thresholds (-gapopen / -gapext).
For example, with illumina reads, you could use "-gapopen 5" and "-gapext 2".

Filtering (-minblast / -bilateral / -mitosize)

According to your sequencing depth, quality and required stringency, you could modify filters.
Increasing the minimum number of BLAST per breakpoint increase the specificity but decrease the sensitivity (-minblast)
By default, eKLIPse filter out deleted mtDNA with a length under 1000bp.
But for example, if you're looking for sublimons you could reduce this length to 100bp.
eKLIPse is based on the search of bidirectional BLAST linking 5' and 3' breakpoints.
It is therefore not recommended to disable this filter ("-bilateral False").


Outputs

eKLIPse_deletions.csv

File containing all predicted deletions (bkp=breakpoint).

Title5'bkp3'bkpFreqFreq forFreq rev5' Blast3' Blast5' Depth3' DepthRepetition
file17753146013,460,386,5522313934127754-GA-7755 | 14601-GA-14602
file27981149557,404,2810,5124082506708025447982-CT-7983 | 14955-CT-14956
file346052437,2413,720,767172197458-CT-459 | 5242-CT-5243

eKLIPse_genes.csv

File summarizing cumulated deletions per mtDNA gene.

GeneStartEndTypefile3file4file5
MT-TF577647trna0,380,8214,03
MT-RNR16481601rrna2,2714,4214,03
MT-TV16021670trna2,2714,4214,03
MT-RNR216713229rrna2,2714,7814,03
MT-TL132303304trna2,2714,7814,03
MT-ND133074262protein2,2715,0514,03

circos plot

One plot is created per input alignment. An example is shown below.

eklipse circos legend

Contact

dooguy@tuta.io

License

eKLIPse is available under the GNU Affero General Public License v3.0.

Reference

Please cite (submitted article)

eKLIPse: A sensitive tool for the detection and quantification of mitochondrial DNA deletions from next generation sequencing data.

About

a tool for predicting mitochondrial DNA deletions using soft-clipping

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published