Skip to content

Latest commit

 

History

History
23 lines (13 loc) · 1.48 KB

README.md

File metadata and controls

23 lines (13 loc) · 1.48 KB

Heme auxotrophy

A brief description of several commands used for the preparation of our manuscript "Heme auxotrophy in abundant aquatic microbial lineages".

Calculation of the completeness of the heme biosynthetic pathway

  1. KofamScan against the KofamKOALA database using a custom-built hal file (refer to this site; https://github.com/takaram/kofam_scan).

$ ./exec_annotation -o *.KOfam.txt *.faa -p /profiles/HemeBiosynthesis.hal -k /ko_list -f mapper

  1. Combine KofamScan results into a single txt file.

$ cat *.KOfam.txt > All.KOfam.txt

  1. Calculation of the completeness of modules and variants of heme biosynthetic pathway.

$ python3 ./KEGG_decoder_Heme.py -i All.KOfam.txt -o All.KOfam.cal.txt -v static

The python script "KEGG_decoder_Heme.py" is a modified version of "KEGG_decoder.py" available at https://github.com/bjtully/BioData/blob/master/KEGGDecoder/.

In short, we removed the definition for all pathways from the original script and then inserted the definition for modules and variants of heme biosynthetic pathway. In addition, we modified lines 293, 294, and 296 of our script to resolve a problem caused by underscores in genome names (accession numbers for the GTDB genomes) as follows (see bjtully/BioData#45).
info[0].split("_")[0] --> info[0].rsplit("_",1)[0]

For detailed instruction on how to use this modified script, refer to the above github repository.