-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the cazy_webscraper
wiki!
cazy_webscraper
is an application and Python3 package for the automated retrieval of protein data from the CAZy database. The code is distributed under the MIT license.
cazy_webscraper
retrieves protein data from the CAZy database into a local SQLite3 database. This enables users to integrate the dataset into analytical pipelines, and interrogate the data in a manner unachievable through the CAZy website.
Please do not perform a complete scrape of the CAZy database unless you specifically require to reproduce the entire CAZy dataset. A complete scrape will take several hours and may unintentionally deny the service to others.
Using the expand
subcommand, a user can retrieve CAZyme protein sequence data from GenBank, and protein structure files from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).
cazy_webscraper
can recover specified CAZy Classes and/or CAZy families, and these queries can be filtered by taxonomy at Kingdoms, genus, species or strain level. Successive CAZy queries can be collated into a single local database. A log of each query is recorded in the database for transparency, reproducibility and shareablity.
If you use `cazy_webscraper, please cite the following publication:
Hobbs, Emma E. M.; Pritchard, Leighton; Chapman, Sean; Gloster, Tracey M. (2021): cazy_webscraper Microbiology Society Annual Conference 2021 poster. FigShare. Poster. [https://doi.org/10.6084/m9.figshare.14370860.v7](https://doi.org/10.6084/m9.figshare.14370860.v7)
Please see the full documentation at ReadTheDocs.