Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fermi hangs on a very small dataset #4

Open
ctSkennerton opened this issue Mar 1, 2013 · 5 comments
Open

fermi hangs on a very small dataset #4

ctSkennerton opened this issue Mar 1, 2013 · 5 comments

Comments

@ctSkennerton
Copy link

I've run fermi on a very small dataset containing 22 fasta records using the following cmd:

run-fermi.pl -k 200 -p cdhitout_0.85 <reads.fa>  | make -f -

however fermi hangs indefinitely. When I look at top I can see that fermi ropebwt is constantly in the sleep state:

45288 uqcskenn  20   0 24188  740  584 S    3  0.0   1:08.84 fermi ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp -                                                                                         
45447 uqcskenn  20   0 24188  740  584 S    2  0.0   1:08.00 fermi ropebwt -a bcr -v3 -btf cdhitout_0.90.ec.tmp - 

I've tried using both the git HEAD and with release 1.1

<reads.fa> contains:

>M00920:10:000000000-A292A:1:1101:2305:13136:1
CTTCTGGTGAAACCCACTCCCATGGTGTGACGGGCGGTGTGTACAAGACCCGGGAACGTATTCACCGCGACATGCTGATCCGCGATTACTAGCGATTCCGACTTCACGCAGTCGAGTTGCAGACTGCGATCCGGACTACGATCGGCTTTGTGAGATTCGCTCCGCCTCGCGGCTTGGCAACCCTCTGTACCGACCATTGTATGACGTGTGAAGCCCTACCCATAAGGGCCATGAGGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCTCGTTAAAGTGCCCAACCAAATGATGGCAATTAACGACAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACAT
>M00920:10:000000000-A292A:1:1101:24216:16298:1
CCCTTATCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAGGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCATCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCA
>M00920:10:000000000-A292A:1:1110:4340:7240:1
CAGATTGAACGCTGGCGGCATGCTTTACACATGCAAGTCGAACGGCAGCGGGGGCTTCGGCCCGCCGGCGAGTGGCGAACGGGTGAGTAATGCATCGGAACGTACCCATGTTGTGGGGGATAACGTAGCGAAAGCTACGCTAATACCGCATAAGCCCTGAGGGGGAAAGCGGGGGATTCTTCGGAACCTCGCGCAATTGGAGCGGCCGATGTCAGATTAGCTAGTTGGTAGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGGACTGAGACACGGCCCGGACTCCTCCGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGCAAGGGTGATC
>M00920:10:000000000-A292A:1:1110:21042:16009:1
ACCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCACATCTCTACGCATTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCACACTCGAGCCTTGCAGTCACAAACGCATTTCCCAGGTTAAGCCCGGGGATTTCACATCTGTCTTACAAAGCCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGGTGCTTGTTCTTCAGTTCCCGTCATTGACAGTCTATGTTAGACCCCGCCGTTTCGTTCCTGCCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGAATGGCTGGATCAGGGT
>M00920:10:000000000-A292A:1:1101:19922:4365:1
ATCTAATCCTGTTTGCTCCCCACGCTTTCGTGCATGAGCGACAGACCAGGTCCAGGGGGCTGCCTTCGCCTTCGATGTTCCTCCTGATATCTACGTATTTCACTGCTACACCCGGATTTCCACCCCCCTCTACCGCACTCTAGGCACACAGTCACAAACGCATTTCCCAGGTTAAGCCCGGGGGTTTCAAATCTGAATTATTTAACCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTCGGTATGACCGCGACTGCCAGCGGGTAGGAAGGCGGTACTTTTTATTCCGGTGCCGACATCCTCCCCGGATATTCACCGCGGCTATTTCTTTCCGTCCGACAGAGGTGTAAAACCCGAAGGCGAGCTTG
>M00920:10:000000000-A292A:1:1101:18095:13295:1
GGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGGAAGCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCGGTGGGGAAGAAATTGCACGGGTTAATACCCTGTGTAGATGACGGTACCCGACTAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTGGTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGAGACTGCCAAGCTGGAGTGTGGCAGAGGGGGGTGGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATCAGGAG
>M00920:10:000000000-A292A:1:2102:3086:14182:1
GTAGTGACCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCACATCTCTACGCATTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCACACTCCAGCCTGGCAGTCTCAAATGCAGTTCCCAGGTTGAGCCCGGGGCTTTCACATCTGACTTACCAAACCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTAACGCGGCTGCTGGCACGTAGTTCGCCGGTGCTTCTTAGTCGGGTACCGTCATCTACACAGGATATTAGCCCGTGCAATTTCTTCCCCACCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGCATGGCTGGATCAGGCTTCCGCCC
>M00920:10:000000000-A292A:1:2108:13711:22806:1
GATTAAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGTGGACGGGTGAGTAAAGCATCGGAACGTATCCTGAAGTGGAGTATAACGTAGCGAAAGTTACGCTAATACCGCATAGTCTGTGAGCAGGAAAGCAGGGGATCGCAAGACCTTGCGCTCTGGGAGCGGCCGATGTCGGATTAGCTAGTTGGGGGGGTAAAGGCCTACCAAGGCGCGGCTCCGTAGCGGGGATTGGAGTATGAAACGCCACACTGTGACTGAGAAACGGCCCGGACTCCTACGTGAGGAAGCAGCGGTGAATTTTTTCCAATGGGTTCAAGCC
>M00920:10:000000000-A292A:1:2110:11377:9313:1
GCATCGGAACGTGCCCTGGAATGGGGGATAACGTAGCGAAAGTTACGCTAATACCGCATATTCTGTGAGCAGGAAAGCAGGGGATCGCAAGACCTTGCGTTCTGGGATCGGCCGATGTCGTATGAGCTAGTTGGTGGGGAAAAGGCCTACCACGGCGACGATCCGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCCGTGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCGGTGGGGAAGAAATTGCATGGGTTAATTCCC
>M00920:10:000000000-A292A:1:1105:17264:25408:1
GAATTACTGGGCGTAAAGCGTGCGCAGGCGGCGCCATAAGACAGACGTGAAATCCCCGGGCTTAACCTGGGAACTGCGTTTGTGACTGTGGTGCTCGAGTGTGGCAGAGGGGGGTGGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAACACCGATGGCGAAGGCAGCCCCCTGGGTCAACACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCGAACTAGGTGTTGGGGAAGGAGACGTTCTTAGTACCGCAGCTAACGCGTGAAGTTCGCCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATGGACA
>M00920:10:000000000-A292A:1:2105:19316:26848:1
ATCCGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCCATTCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCAGCAGGAACGAAACGGCTCTCTCTAACATAGGGAGTTAATGACGGTACCTGAAGAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCACAGGCGGCGCCATAAGACAGATGTGAAATCCCCGGGCTTAACCTGGGAAC
>M00920:10:000000000-A292A:1:1111:13173:15398:1
TGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTGCCAGAGATGGCTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCACCGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTTCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGCTGAAGTCAAGTCATCATGGCCCTTATGGGTAGGGCGTCACACGTCATACAATGGTCGGAACAGAGGGTTGCCAAGCCGCGAGGTGGAGCCAATCCCAGAAAACCGATCGTAGTCCGGATCGC
>M00920:10:000000000-A292A:1:1102:8010:26367:1
GCCTTACACATGCAAGTCGAACGGCAGCGGAACTTCGGGTGCCGGCGAGTGGCGAACGGGTGAGTAATGCATCGGAACGTGCCATTGAGTGGGGGATAACGTAGCGAAAGTTGCGCTAATACCGCATATTCTGTGAGCAGGAAAGCAGGGGACCGCAAGGCCTTGCGCTCTTTGAGCGGCCGATGTCAGATTAGCTAGTTGGTGAGGTAAAGGCTTACCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGT
>M00920:10:000000000-A292A:1:1106:8344:21464:1
GTTCCTACCATTGTAGCACGTGTGTAGCCCTGGGCATAAAGGCCATGATGACTTGACATCATCCCCTCCTTCCTCGCGTCTTACGACGGCAGTTTCTTTAGAGTTCCCAGCTTAACCTGTTGGCAACTAAAGATAGGGGTTGCGCTCGTTGCGGGACTTAACCCAACACCTCACGGCACGAGCTGACGACAGCCATGCAGCACCTGTGTGACGGCTCCCTTTCGGGCACCCTCAACTCTCATCGAGGTTCCGTCCATGTCAAGGGTAGGTAAGGTTTTTCGCGTTGCATCGAATTAATCCACATCATCCACCGCTTGTGCGGGTCCCCGTCAATTCCTTTGAGTTTTAATC
>M00920:10:000000000-A292A:1:1109:11262:3539:1
TTTACCCACCCAACACCTAGTTGACATAGTTTAGGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTACCCACGCTTTCGTGCATGAGCGTCAGTATCGGCCCAGGGGGCTGCCTTCGCCATAGGTGTTCCTCCCCATCTCTACGCTTTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCGTACTCTAGTGAGGCAGTCACAAACGCAGTTCCCAGGTTACGCCCGGGGATTTCACGCCTGTCTTACCAATCCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGGTGCTTCTTATGCCGGTACCG
>M00920:10:000000000-A292A:1:1113:21063:11515:1
ACACAGGGTATTAACCCATGCGATTTCTTCCCGGCCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGCATGGCTGGATCAGGGTTGCCCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGGAGGAGTCTGGCCCGTGTCTCAGTTCCAGTGTGGCGGATCATCCTCTCAGACCCGCTCCAGATCGTCGCCTTGGTAAGCCGTTACCTCACCAACTAGCTAATCTGACATAGGCCGCTCAAAGAGCGCAAGGCCTTGCGGTCCCCTGCTTTCCTGCTCACAGAATATGCGGTATTAGCGCAACTTTCGCTACGTTATCCCCCACTCAATGGCACGTTCCGATGCATTACTCACC
>M00920:10:000000000-A292A:1:2109:18065:11577:1
CCTTTGTATTGTCCATTGTAGCACGTGTGTAGCCCAAATCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCAACTTAGAGTGCCCAACTTAATGATGGCAACTAAGCTTAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCAGTGTGACGGCTCCCTTTCGGGCACCCTCAACTCTCATCGAGGTTCCGTCCATGTCAAGGGTAGGTAAGGTTTTTCGCGTTGCATCGAATTAATCCACATCATCCACCGCTTGTGCGGGTCCCCGTCAATTCCTTTGAGTTTTAATC
>M00920:10:000000000-A292A:1:2113:10809:18271:1
GTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTCACCTACCCTTGACATGGACGGAACCTCGATGAGAGTTGAGGGTGCCCGAAAGGGAGCCGTCACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTAAGCTTAGTTGCCATCATTAAGTTGGGCACTCTAAGTTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCTACACACGTGCTACAA
>M00920:10:000000000-A292A:1:2101:18998:6292:1
GTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTAGCAGAGATGCTTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAAGGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATGGGTAGGGCTTCACACGTCATACAATGGTCGGAACAGAGGGTTGCCAAGCCGCGAGGTGGAGCCAATCCCAGAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGAC
>M00920:10:000000000-A292A:1:2108:17778:22051:1
ATCCACAGAACTTAGCAGAGATGCTTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGGGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATC
>M00920:10:000000000-A292A:1:1104:5131:15907:1
GTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCGACTAGTCGTTCGGAGCAGCAATGCACTGAGTGACGCAGCTAACGCGTGAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTCTGGAGCCTTGGTGAGAGCCGAGGGTGCCTTCGGGAGCCAGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGT
>M00920:10:000000000-A292A:1:1113:7839:16644:1
CGTTTAGGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGTGCATGAGCGTCAGTACAGGCCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCTGATCTCTACGCATTTCACTGCTACACCAGGAATTCCACACACTTCTGCCGTACTCTAGCCTTGCAGTCACAAACGCAGTTCCCAGGTTAAGCCCGGGGATTTCACATCTGTCTTACAAAAACGCCTCCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTTTTACCGCGGCTGCTGGCACGTTTTTAGCCGGTGCTTCTTAGTCCGGTACCGTCATCCATGGCCTATGTTAGAGAC
@lh3
Copy link
Owner

lh3 commented Mar 1, 2013

With your command line, fermi should not use ropebwt. Can you find string ropebwt in your makefile?

@ctSkennerton
Copy link
Author

Yes I can, full makefile shown below

FERMI=fermi
UNITIG_K=200
OVERLAP_K=240

all:cdhitout_0.85.p2.mag.gz

# Construct the FM-index for raw sequences
cdhitout_0.85.raw.fmd:../cdhitout_0.85.fa
    (cat ../cdhitout_0.85.fa) | $(FERMI) ropebwt -a bcr -v3 -btNf cdhitout_0.85.raw.tmp - > $@ 2> $@.log

# Error correction
cdhitout_0.85.ec.fq.gz:cdhitout_0.85.raw.fmd
    (cat ../cdhitout_0.85.fa) | $(FERMI) correct -t 2  $< - 2> $@.log | gzip -1 > $@

# Construct the FM-index for corrected sequences
cdhitout_0.85.ec.fmd:cdhitout_0.85.ec.fq.gz
    $(FERMI) fltuniq $< 2> cdhitout_0.85.fltuniq.log | $(FERMI) ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp - > $@ 2> $@.log

# Generate unitigs
cdhitout_0.85.p0.mag.gz:cdhitout_0.85.ec.fmd
    $(FERMI) unitig -t 2 -l $(UNITIG_K) $< 2> $@.log | gzip -1 > $@

cdhitout_0.85.p1.mag.gz:cdhitout_0.85.p0.mag.gz
    $(FERMI) clean $< 2> $@.log | gzip -1 > $@
cdhitout_0.85.p2.mag.gz:cdhitout_0.85.p1.mag.gz
    $(FERMI) clean -CAOFo $(OVERLAP_K) $< 2> $@.log | gzip -1 > $@

@lh3
Copy link
Owner

lh3 commented Mar 1, 2013

I see. I was using an old version of run-fermi.pl. More recent version use ropebwt by default. Anyway, I can see the problem now: fltuniq has filtered out all the reads, while ropebwt is expecting some input and thus hanging for some reason. For the time being, you can edit makefile and change the line containing fltuniq to cat $< | $(FERMI) ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp - > $@ 2> $@.log. This skips fltuniq. I will look into the ropebwt issue later. But anyway, probably you won't get a good assembly from these reads.

@lh3
Copy link
Owner

lh3 commented Mar 1, 2013

For small files, actually we'd better not use fltuniq anyway. I should consider to add an option to optionally skip fltuniq altogether.

@ctSkennerton
Copy link
Author

thanks, specifying -B in run-fermi.pl prevents the hang as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants