FastQC is a program designed to do the quality control on raw sequence data coming from high throughput sequencers. The program evaluates the quality of reads in different analysis modules, providing various statistics plots about the quality of reads. This program replicates the original FastQC program written in Java and released in 2010, making it available in Python.
Team:
- Anna Toidze:
- : AnnaToi01
- Tasks:
- Basic Statistics (part of FastQC)
- Per tile sequence quality (part of FastQC, together with Ivan Semenov)
- Adapter content (part of FastQC)
- Sequence length distribution (part of FastQC)
- README.md (together with Anton Muromtsev)
- Anton Muromtsev:
- : AntonMuromtsev
- Taks:
- Overrepresented sequences (Part of FastQC)
- Sequence Duplication Levels (Part of FastQC)
- Binding images into .pdf (service function)
- README.me (together with Anna Toidze)
- Ivan Semenov
- : ipsemenov
- Tasks:
- Per tile sequence quality (part of FastQC, together with Anna Toidze)
- Quality scores across all bases (Part of FastQC)
- Quality score distribution over all sequences (Part of FastQC)
- Mikhail Fofanov
- : MVFofanov
- Tasks:
- Sequence content across all bases (Part of FastQC)
- N content across all bases (Part of FastQC)
- GC distribution over all sequences (Part of FastQC)
Pipeline is separated into several files located in scripts
folder:
-
caclulations.py
contains functions required for calculations -
plotting.py
contains functions for plotting graphs -
main.py
contains main piece pf code for running all computations
Clone repository
$ git clone git@github.com:ipsemenov/FastQC_simulator.git
Move to project directory
$ cd FastQC
Set up virtual environment in working directory:
- Create virtual environment
-
Via
virtualenv
- Install virtualenv if it is not installed.
$ pip install virtualenv
- Create virtual environment
$ virtualenv venv --python=3.8
- Activate it
$ source ./venv/bin/activate
- Install virtualenv if it is not installed.
-
Via
conda
- Install Anaconda
- Create virtual environment
$ conda create --name <env_name> python=3.8
- Activate it
$ conda activate <env_name>
-
- Install necessary libraries
$ pip install -r requirements.txt
- Install package wkhtmltopdf
$ sudo apt-get install wkhtmltopdf
This instrument is a console utility maintaining following parameters:
-i , --input path to fastq file`
-o , --output path to output folder for storing results
-a , --adapters path to file with adapters. Default: ./adapters.txt
Example workflow:
$ python main.py -i <path_to_fastq> -o <path_to_ouptut_dir> -a <path_to_adapters>
To show brief information about parameters, execute following command:
$ python main.py -h
Test data can be found in test_data/
folder. The amp_res_1.fastq.gz
has to be unzipped:
$ gunzip amp_res_1.fastq.gz
Run the program on test data (from the scripts
folder):
$ python main.py -i ../test_data/amp_res_1.fastq -o ../test_data/results -a ./adapters.txt
The test result, results.pdf
, can be found in 'test_data/results', along with single images of the statistics modules.