FastQC Simulator

FastQC is a program designed to do the quality control on raw sequence data coming from high throughput sequencers. The program evaluates the quality of reads in different analysis modules, providing various statistics plots about the quality of reads. This program replicates the original FastQC program written in Java and released in 2010, making it available in Python.

Team: Fasta and Curious

Team:

Anna Toidze:
- : AnnaToi01
- Tasks:
  - Basic Statistics (part of FastQC)
  - Per tile sequence quality (part of FastQC, together with Ivan Semenov)
  - Adapter content (part of FastQC)
  - Sequence length distribution (part of FastQC)
  - README.md (together with Anton Muromtsev)
Anton Muromtsev:
- : AntonMuromtsev
- Taks:
  - Overrepresented sequences (Part of FastQC)
  - Sequence Duplication Levels (Part of FastQC)
  - Binding images into .pdf (service function)
  - README.me (together with Anna Toidze)
Ivan Semenov
- : ipsemenov
- Tasks:
  - Per tile sequence quality (part of FastQC, together with Anna Toidze)
  - Quality scores across all bases (Part of FastQC)
  - Quality score distribution over all sequences (Part of FastQC)
Mikhail Fofanov
- : MVFofanov
- Tasks:
  - Sequence content across all bases (Part of FastQC)
  - N content across all bases (Part of FastQC)
  - GC distribution over all sequences (Part of FastQC)

Installation and Usage

Pipeline structure

Pipeline is separated into several files located in scripts folder:

caclulations.py contains functions required for calculations
plotting.py contains functions for plotting graphs
main.py contains main piece pf code for running all computations

Preliminary settings

Clone repository

$ git clone git@github.com:ipsemenov/FastQC_simulator.git

Move to project directory

$ cd FastQC

Set up virtual environment in working directory:

Create virtual environment
- Via virtualenv
  - Install virtualenv if it is not installed.
```
$ pip install virtualenv
```
  - Create virtual environment
```
$ virtualenv venv --python=3.8
```
  - Activate it
```
$ source ./venv/bin/activate
```
- Via conda
  - Install Anaconda
  - Create virtual environment
```
$ conda create --name <env_name> python=3.8
```
  - Activate it
```
$ conda activate <env_name>
```
Install necessary libraries

$ pip install -r requirements.txt

Install package wkhtmltopdf

$ sudo apt-get install wkhtmltopdf

Console interface

This instrument is a console utility maintaining following parameters:

  -i ,  --input     path to fastq file`
  -o , --output     path to output folder for storing results
  -a , --adapters   path to file with adapters. Default: ./adapters.txt

Running utility

Example workflow:

$ python main.py -i <path_to_fastq> -o <path_to_ouptut_dir>  -a <path_to_adapters>

To show brief information about parameters, execute following command:

$ python main.py -h

Test data

Test data can be found in test_data/ folder. The amp_res_1.fastq.gz has to be unzipped:

$ gunzip amp_res_1.fastq.gz

Run the program on test data (from the scripts folder):

$ python main.py -i ../test_data/amp_res_1.fastq -o ../test_data/results -a ./adapters.txt

The test result, results.pdf, can be found in 'test_data/results', along with single images of the statistics modules.

Software Requirements

Python 3.8
Ubuntu 20.04 and 21.04
Git 2.30.2
Markdown
HTML
GitHub
Bash
Rest of the requirements are in requirements.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastQC Simulator

Table of Contents

Team: Fasta and Curious

Installation and Usage

Pipeline structure

Preliminary settings

Console interface

Running utility

Test data

Software Requirements

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
scripts		scripts
test_data		test_data
README.md		README.md
requirements.txt		requirements.txt

ipsemenov/FastQC_simulator

Folders and files

Latest commit

History

Repository files navigation

FastQC Simulator

Table of Contents

Team: Fasta and Curious

Installation and Usage

Pipeline structure

Preliminary settings

Console interface

Running utility

Test data

Software Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages