Skip to content

Tool for quick offline batch conversion of Genbank IDs or accessions to taxonomy strings

Notifications You must be signed in to change notification settings

richardmleggett/acc2tax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

acc2tax
Richard.Leggett@tgac.ac.uk

Given a file of accessions or Genbank IDs (one per line), this program will return a taxonomy string for each.

Lookup for Genbank IDs is quicker than for accessions, as the lookup table is stored in RAM (though this does mean it takes a couple of minutes to load). For accessions, the lookup is from disc.

Database files can be downloaded from:
ftp://ftp.ncbi.nih.gov/pub/taxonomy

The files required are:
nodes.dmp
names.dmp
gi_taxid_nucl.dmp
gi_taxid_prot.dmp

For accessions, you will need a merged sorted copy of some of the files from the accession2taxid directory. You need to:
cat nucl_est.accession2taxid nucl_gb.accession2taxid nucl_gss.accession2taxid nucl_wgs.accession2taxid dead_nucl.accession2taxid dead_wgs.accession2taxid | sort > nucl_all.txt
cat prot.accession2taxid dead_prot.accession2taxid | sort > prot_all.txt
and place thsese files in the same directory as the above database files.

To compile, type:
cc -o acc2tax acc2tax.c

For details of running, type:
acc2tax -h


About

Tool for quick offline batch conversion of Genbank IDs or accessions to taxonomy strings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published