ConsensusSV-ONT-pipeline

What is ConsensusSV-ONT?

The tool designed for getting consensus out of multiple SV callers' results. The method uses six independent, state-of-the-art structural variant callers for long-read sequencing along with a convolutional neural network for filtering high-quality variants. We provide a runtime environment in the form of a docker image, wrapping a nextflow pipeline for efficient processing using parallel computing.

Docker image: https://hub.docker.com/repository/docker/antonipietryga/consensussv-ont-pipeline/

docker pull antonipietryga/consensussv-ont-pipeline

Citation

If you use ConsensuSV-ONT, please cite: ConsensuSV-ONT - a modern method for accurate structural variant calling Antoni Pietryga, Mateusz Chiliński, Sachin Gadakh, Dariusz Plewczynski bioRxiv 2024.07.26.605267; doi: https://doi.org/10.1101/2024.07.26.605267

Preparation of your samples

Since the algorithm is easy-to-run, the preparation of the sample is minimal. The algorithm works with fastq.gz files, and a csv file containing all the sample information is needed. There is only one column, that is the path to the fastq.gz file.

Path
/HG00733.fastq.gz
/HG00514.fastq.gz

Bear in mind that the column headers are provided only for the ease of the example, and should not be present in the csv file.

Running the pipeline

To get into the docker image for working with your data, it's best to mount local directories to the container:

docker run -v /mnt:/mnt -p 8082:8082 -it antonipietryga/consensussv-ont-pipeline

Once in the container, we can run the pipeline using the following command:

nextflow pipeline.nf --input files.csv

Remember to put your samples in the file samples.csv according to the praparation of your samples.

All the parameters that can be used with the script are shown in the following table:

Parameter	Description
--input	File location of the csv file that described all the samples according to the (#preparation-of-your-samples).
--threads	Max number of threads per task.
--mem	Max memory per thread.
--outdir	Output dir of ConsensuSV-ONT.
--ref	Reference genome

Output location

The location of the output depends on your working directory, provided as the parameter. In that directory, two folders will be created:

alignments - a folder where alignment files for each sample are stored
vcfs - where you will find folders for each of the sample, containing all VCF files from the individual SV calling tools and consensuSV-ONT

CNN model architecture

_________________________________________________________________
Layer (type)                Output Shape              Param #   
=================================================================
 conv3d (Conv3D)             (None, 50, 50, 3, 4)      112       
                                                                 
 batch_normalization (BatchN  (None, 50, 50, 3, 4)     16        
 ormalization)                                                   
                                                                 
 max_pooling3d (MaxPooling3D  (None, 25, 25, 3, 4)     0         
 )                                                               
                                                                 
 conv3d_1 (Conv3D)           (None, 25, 25, 3, 8)      872       
                                                                 
 batch_normalization_1 (Batc  (None, 25, 25, 3, 8)     32        
 hNormalization)                                                 
                                                                 
 max_pooling3d_1 (MaxPooling  (None, 12, 12, 3, 8)     0         
 3D)                                                             
                                                                 
 conv3d_2 (Conv3D)           (None, 12, 12, 3, 16)     3472      
                                                                 
 batch_normalization_2 (Batc  (None, 12, 12, 3, 16)    64        
 hNormalization)                                                 
                                                                 
 max_pooling3d_2 (MaxPooling  (None, 6, 6, 3, 16)      0         
 3D)                                                             
                                                                 
 flatten (Flatten)           (None, 1728)              0         
                                                                 
 dense (Dense)               (None, 32)                55328     
                                                                 
 dense_1 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 59,929
Trainable params: 59,873
Non-trainable params: 56```

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
weights		weights
Dockerfile		Dockerfile
README.md		README.md
config.ini		config.ini
consensusv_ont.py		consensusv_ont.py
header.txt		header.txt
nextflow.config		nextflow.config
pipeline.nf		pipeline.nf
random_positions_chrxy.bed		random_positions_chrxy.bed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConsensusSV-ONT-pipeline

What is ConsensusSV-ONT?

Citation

Preparation of your samples

Running the pipeline

Output location

CNN model architecture

About

Releases

Packages

Contributors 2

Languages

SFGLab/ConsensuSV-ONT-pipeline

Folders and files

Latest commit

History

Repository files navigation

ConsensusSV-ONT-pipeline

What is ConsensusSV-ONT?

Citation

Preparation of your samples

Running the pipeline

Output location

CNN model architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages