This repository contains a tool that leverages the RCSB PDB Search API to facilitate the bulk downloading of Protein Data Bank (PDB) files and their biological assemblies. With the help of predefined JSON queries, users can easily specify and modify search criteria to retrieve and download a large number of PDB files efficiently.
It simplifies the process of specifying search criteria and handling bulk downloads, making it ideal for working with large sets of protein structures.
- Customizable Queries: Use JSON to define search parameters for precise results.
- Bulk Downloading: Automate the retrieval and downloading of numerous PDB files in a few clicks.
- Efficiency: Bypass the website's limitations on batch sizes for an enhanced downloading process.
- Stability: Avoid website crashes and timeouts with robust API calls.
- Speed: Quickly adjust search parameters and execute downloads faster than manual processes.
To get started with this notebook, you should have Jupyter installed on your system. If you don't have Jupyter, you can install it via Anaconda or with pip:
pip install notebook
To set up your search criteria:
-
Go to the RCSB PDB website and use the Advanced Search Query Builder options to define your search criteria.
-
Once you have your results, click on the "Search API" button as shown below:
-
Copy the JSON query from the RCSB PDB site.
-
Paste into a new text file and save as a new .json in the pdb_json directory.
- Open the
rcsb_mass.ipynb
notebook in Jupyter. - Modify the cell with the path to your JSON file
# Define the path to your JSON file
json_file_path = 'pdb_json/your_query_file.json' # Adjust the filename as needed