faster sequence file (GenBank) parsing #1

seanrjohnson · 2024-04-21T17:02:43Z

GenBank file parsing is a major bottleneck for domain_search.py on large databases. The current GenBank parser is a fork of the BioPython GenBank parser, which is pure python, uses some regexes, and is slow. It would be great to integrate something like the rust parser: https://github.com/althonos/gb-io.py

A complication is that Domainator internals are quite reliant on BioPython SeqRecord objects, which might be hard to interface with or replicate with a faster genbank parser.

The text was updated successfully, but these errors were encountered:

seanrjohnson added the enhancement New feature or request label Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster sequence file (GenBank) parsing #1

faster sequence file (GenBank) parsing #1

seanrjohnson commented Apr 21, 2024

faster sequence file (GenBank) parsing #1

faster sequence file (GenBank) parsing #1

Comments

seanrjohnson commented Apr 21, 2024