Skip to content

Latest commit

 

History

History
35 lines (33 loc) · 2.7 KB

README.md

File metadata and controls

35 lines (33 loc) · 2.7 KB

Genome-Aligners

The cost of sequencing a living being’s genome has been greatly reduced due to the advent of Next Generation Secuencing (NGS) techniques. This situation has led to the apparition of many new aligners that can find the position of a given NGS sequence in a reference genome. However, it is difficult to choose the aligner that best adapts to a problem given the shortage of fair comparisons between aligners in terms of alignment effectiveness and computational cost. Besides, another problem commonly faced by bioinformaticians is the correct adjustment of aligner’s metaparameters. These difficulties are even greater considering that these aligners are commonly used as black-boxes due to their complexity and the lack of a detailed description and analysis of them.

The objective of this Masters Thesis is to provide a theoretical analysis and implementation of three aligners that will serve to compare them, showing which is the best algorithm for each type of NGS sequencing problem. Additionally, this work includes an analysis to determine the influence of aligners’ metaparameters in their performance. In order to cover both long and short sequence aligners as well as aligners considering either only mutations (mismatches) or both mutations and gaps, three different aligners, namely Bowtie, BWA and BWT-SW, have been selected for the analysis. These aligners have a common structure, the FM-Index, that allows optimal searching in the reference genome with low memory consumption by using the Burrows-Wheeler transform and its properties. This Masters Thesis offers a description of the hidden details of every algorithm, which has been possible through the thorough study of scientific papers where these algorithms are proposed. Both the FM-Index and the aligners were implemented in C++, and their proper functioning was verified by black and white box unit testing.

Once the aligners were implemented, ART software was used to simulate NGS sequences. This software receives as parameters the NGS technology to simulate and other values to control sequences’ length, number of mismatches, and gap probability. The behaviour of these three aligners for different values of these parameters has been compared in terms of execution time and hit rate, varying every aligner’s metaparameters as well. The outcomes of this Masters Thesis are a detailed study of some of the most used alignment algorithms based on the FM-Index, which intends to complement existing literature; a simple implementation of these aligners, which favors their comprehension and comparison; and a quantitative comparative analysis, which allows us to conclude when each aligner is more suitable than others for an specific sequencing problem.