Name		Name	Last commit message	Last commit date
parent directory ..
alphafold_paddle		alphafold_paddle
data		data
model_configs		model_configs
tape		tape
utils		utils
README.md		README.md
helixfold_single_inference.py		helixfold_single_inference.py
requirements.txt		requirements.txt

README.md

HelixFold-Single Inference

AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) and templates as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs and templates from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary sequences of proteins. HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2. Our proposed method, HelixFold-Single, first pre-trains a large-scale protein language model (PLM) with thousands of millions of primary sequences utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs and templates for learning the co-evolution information. Then, by combining the pre-trained PLM and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence.

Online Service

For those who want to try out our model without any installation, we also provide an online interface PaddleHelix HelixFold-Single Forecast through web service.

Environment

To reproduce the results reported in our paper, specific environment settings are required as below.

python: 3.7
cuda: 11.2
cudnn: 8.10.1
nccl: 2.12.12

Installation

Except those listed in the requirements.txt, PaddlePaddle dev package is required to run HelixFold. Visit here to install PaddlePaddle dev. Also, we provide a package here if your machine environment is Nvidia A100 with cuda=11.2.

python -m pip install -r requirements.txt
wget https://baidu-nlp.bj.bcebos.com/PaddleHelix/HelixFold/paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
python -m pip install paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl

Download the Trained Model

Here we provide the trained model that can be used to reproduce the results of our paper.

wget https://baidu-nlp.bj.bcebos.com/PaddleHelix/HelixFold-Single/helixfold-single.pdparams

Usage

To run the inference, what you need is a fasta file and the pre-downloaded trained model:

python helixfold_single_inference.py \
        --init_model=./helixfold-single.pdparams \
        --fasta_file=data/7O9F_B.fasta \
        --output_dir="./output"

init_model: the trained model.
fasta_file: the fasta_file file which contains the protein sequence to be predicted.
output_dir: the path to the output.

The output is organized as：

./output
    unrelaxed.pdb

where unrelaxed.pdb is the predicted pdb file.

Citing this work

If you use the code or data in this repository, please cite:

@article{fang2022helixfold_single,
  title={HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative},
  author={Fang, Xiaomin and Wang, Fan and Liu, Lihang and He, Jingzhou and Lin, Dayong and Xiang, Yingfei and Zhang, Xiaonan and Wu, Hua and Li, Hui and Song, Le},
  journal={arXiv preprint arXiv:2207.13921},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helixfold-single

helixfold-single

README.md

HelixFold-Single Inference

Online Service

Environment

Installation

Download the Trained Model

Usage

Citing this work

Files

helixfold-single

Directory actions

More options

Directory actions

More options

Latest commit

History

helixfold-single

Folders and files

parent directory

README.md

HelixFold-Single Inference

Online Service

Environment

Installation

Download the Trained Model

Usage

Citing this work