Skip to content

Latest commit

 

History

History
41 lines (34 loc) · 2.09 KB

README.md

File metadata and controls

41 lines (34 loc) · 2.09 KB

GitHub example workflow

finder

A python script to locate read alignments in nucleotide sequences using a naive matching approach. I know, slow, but the project startet initially to learn more about the read alignment problem. A long-term goal consists of the implementation of Boyer-Moore. The underlying idea was to identify binding sites that comprise only a few nucleotides in target sequences, such as those required for miRNA or primer binding sites. Of course, it can also be used as a general tool to identify partial regions in nucleotide sequences. Because of its lacking performance, it is not intended to work with large datasets, but rather as a downstream tool for detailed analysis.

Usage

python3 finder.py -t template.fa -q query.fa -o /your/output/path 

Arguments:

Parameter Description Default
-t (--target) path to template file
-q (--query) path to query file
-o (--output) path to output folder
-m (--mismatch) number of mismatches allowed 0
-s (--save) Save output to file False
-r (--rev) Search also in reverse complement of target sequence False

If no output path (-o) is specified, the current working directory is used.

Test

The 'data' folder contains files with arbitrarily generated nucleotide sequences for testing purposes. Try them out using:

python3 finder.py -t ./data/template.fa -q ./data/query.fa --mismatch 2

Output

The output is structured into a mapping file (in progress):

File Description
mapping.txt contains a simple text based visualization of template sequences with at least one hit

Feedback

If you have any feedback or comments, please send me a mail or open an issue on github.