Skip to content

Latest commit

 

History

History
26 lines (17 loc) · 1014 Bytes

README.md

File metadata and controls

26 lines (17 loc) · 1014 Bytes

SCUBAT2

Overview

SCUBAT2 (Scaffolding Contigs Using BLAST And Transcripts v2) uses transcriptome or proteome information to scaffold the genome. It was inspired by the original SCUBAT algorithm by Ben Elsworth.

Requirements

Python Libraries

Biopython - to parse BLAST XML file

Numpy - to calculate some statistics

Details

Requires a BLAST XML file

blastn -query transcripts.fa -db contigs.fa -evalue 1e-25 -outfmt 5 -out blast.xml

For the same species the default settings for identity cutoff should be okay

The user must specify the max allowed intron size (i.e for nematode species ~ 20000 bp). Alternatively the user can run the program with --intron_size_run that creates the file intron_size which has the intron sizes calculated by the mapped transcripts

Example command

SCUBAT_v2.py -b [blast.xml] -f [assembly.file] -max 20000