SuRoQ (Small RNA Quality) - a pipeline for quick and dirty QC of your small RNA (piRNA-oriented) sequencing data

SuRoQ requires only your demultiplexed and adapter-trimmed reads in FASTQ or FASTA format (gzip and bz2 compressions are supported), genome assembly FASTA and TE consensus sequences FASTA. It produces three kinds of plots:

Reads size distribution for those that mapped to genome and TEs (NOT mutually exclusive!). For TEs, blue bars indicate sense-mapped reads, while red bars represent antisense-mapped reads. Only unique reads are used for the size distributions, ensuring that each small RNA is counted only once. This method, while may be not perfect, mitigates potential effects from high numbers of specific small RNAs that could skew the distribution.
WebLogo (seqLogo) plots for sense and antisense TE-mapped reads, useful for validating the U1- and A10-bias.
Ping-pong signature with Z score for 10-nt overlap indicated in the title.

NB: As a first step, SuRoQ removes reads containing homopolymer stretches of at least 10 nt, e.g. AAAAAAAAAA.

SuRoQ heavily borrows from piPipes, namely a concepts of .insert and .BED2 files (for clarification refer to piPipes) and a couple of C++ functions that deal with those formats and get a ping-pong signatures.

Installation

SuRoQ works on Linux x64 systems, it wasn't tested on Mac, but it's possible in theory. For installation, clone this repo via

git clone https://github.com/foriin/SuRoQ.git

Then, use suroq.yml file to prepare a conda environment (I use mamba, because it is infinitely faster):

mamba env create -f suroq.yml

If you don't want to set a conda environment, here's the software list:

Running

Run SuRoQ with:

./SuRoQ.sh <your_reads> <genome.fasta> <TEs.fasta> [number_of_cores] [output_directory]

The last two parameters are optional but you have to specify both if you want to set only the output directory name. I will work on improving arguments handling pretty soon. After completion, you will find the plot in the plots directory and all the files used for its generation in the tables directory.

Tips

Run SuRoQ for all your samples using the same output directory. That way, it won't generate bowtie indices each time and use indices made in the first run.
Change your file names to reflect their contents (better, copying them first), e.g., not CX99889_GATTC_R0.fastq.gz, but OvariesZucKD_rep1.fastq.gz.
The more cores you use the faster the program runs ¯\_(ツ)_/¯

Disclaimer

This software tool is currently under development. Users assume all risks related to its use. If you have any problems, open an issue here or email me

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
bin		bin
example		example
src		src
LICENSE		LICENSE
README.md		README.md
SuRoQ.sh		SuRoQ.sh
suroq.yml		suroq.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuRoQ (Small RNA Quality) - a pipeline for quick and dirty QC of your small RNA (piRNA-oriented) sequencing data

Installation

Running

Tips

Disclaimer

About

Releases

Packages

Languages

License

foriin/SuRoQ

Folders and files

Latest commit

History

Repository files navigation

SuRoQ (Small RNA Quality) - a pipeline for quick and dirty QC of your small RNA (piRNA-oriented) sequencing data

Installation

Running

Tips

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages