Skip to content

Latest commit

 

History

History
37 lines (19 loc) · 1.2 KB

README.md

File metadata and controls

37 lines (19 loc) · 1.2 KB

Protein Structure Information Retrieval

Description

Similar to Foldseek, this project implements a protein structure database searching methodology, while the method used here is based on GVP-GNN for protein structure representation learning.

Training Dataset

We use Foldseek to generate the ground-truth datasets.

Query Database

We use CATH/Gene3D dataset, see this page to download the .pdb format dataset.

Target Database

We use Alphafold protein structure database, see this page to download the Swiss-Prot dataset (Huge!!! about 26GB compressed).

App

The app will be constructed later.

Package Requirements

Pytorch Geometric

Biopython

Biotite

FAIR-ESM

Foldseek

Pandas

WandB