-
OS: ubwt currently can only be complied and run on Linux/Unix.
-
Dependency: zlib
-
Toy example of test data are in
test_data
git clone https://github.com/yangao07/ubwt.git
cd ubwt; make
$ ./ubwt unipath ./test_data/chr20.ubwt -o ./test_data/chr20.unipath.fa
$ ./ubwt unipath ./test_data/chr20.ubwt -e ./test_data/chr20.edges -k 55 -a G -o ./chr20.unipath.gfa
$ ./ubwt index ./test_data/chr20.ubwt
$ ./ubwt query ./test_data/chr20.ubwt ./test_data/in.query
Program: ubwt
Usage: ubwt <command> [options]
Commands:
unipath generate uni-path sequence from BWT string
index build BWT index for uni-path sequence
query find exact match of query sequence with BWT index
Usage: ubwt unipath [option] <BWT-STR>
Options:
-t [INT] Number of threads. [1]
-f [STR] Format of input bwt-str. [B]
"B": binary file, 4-bit per bp, 0/1/2/3/4:A/C/G/T/#(first 64-bit: length).
"P": plain text.
-e [STR] Edge sequence file in binary format. Required when output GFA format. [NULL]
-k [INT] Length of k-mer. Required when output GFA format.
-a [STR] Format of output file. [F]
"F": FASTA format.
"G": GFA format.
-o [STR] Output file. [stdout]
# FASTA output
$ ./ubwt unipath ./test_data/chr20.ubwt -o ./test_data/chr20.unipath.fa
$ head -n4 ./test_data/chr20.unipath.fa
>1_246
GGCCAACATGGTGAAACCCCATCTCTACTAAAAATACAAAAATTAGCTGGGCGTCATGGCGGGCGCCTGTAGTCCCAGCTACCTGGGAGGTTGAGGTGGGAGAATCGCTTGAACCCGGGAGGCGGAGGCTGCAGCAGCTGAGATTGTGCCACTGCACTCCAGCCTTGGCCACAGAGCAAGACCTTGTCTCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATATATAT
>2_55
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATATATATATATATATA
# GFA output
$ ./ubwt unipath ./test_data/chr20.ubwt -e ./test_data/chr20.edges -k 55 -o ./test_data/chr20.unipath.gfa -a G
$ head -n4 ./test_data/chr20.unipath.gfa
S 1 GGCCAACATGGTGAAACCCCATCTCTACTAAAAATACAAAAATTAGCTGGGCGTCATGGCGGGCGCCTGTAGTCCCAGCTACCTGGGAGGTTGAGGTGGGAGAATCGCTTGAACCCGGGAGGCGGAGGCTGCAGCAGCTGAGATTGTGCCACTGCACTCCAGCCTTGGCCACAGAGCAAGACCTTGTCTCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATATATAT LN:i:246
L 1 + 6 + 54M
S 2 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATATATATATATATATA LN:i:55
L 2 + 3 + 54M
>uniID_uniLen
uniSeq
- uniID: ID of unipath
- uniLen: Length of unipath
- uniSeq: Sequence of unipath
S uniID uniSeq LN:i:uniLen
L uniID1 + uniID2 + xxxM
- uniID: ID of unipath
- uniSeq: Sequence of unipath
- uniLen: Length of unipath
- xxx: k-1
Usage: ubwt index [option] <BWT-STR>
Options:
-t [INT] Number of threads. [1].
-f [STR] Format of input bwt-str. [B]
"B": binary file, 4-bit per bp, 0/1/2/3/4:A/C/G/T/#(first 64-bit: length).
"P": plain text.
-o [STR] Prefix of bwt-index file. [<BWT-STR>]
$ ./ubwt index -t2 ./test_data/chr20.ubwt
- Output index file will have
.ubwti
suffix, For example,./test_data/chr20.ubwt.ubwti
Usage: ubwt query <BWT-index.prefix> <in-query> > match.out
Find exact match of input query on BWT-index.
Return <unipath name> and <offset(1-based)> on unipath.
$ ./ubwt query ./test_data/chr20.ubwt ./test_data/in.query
$ 9_661 11
- Output consists two columns: <unipath_name> <offset(1-based)>