Name		Name	Last commit message	Last commit date
parent directory ..
alphafold_paddle		alphafold_paddle
data_configs		data_configs
demo_data		demo_data
scripts		scripts
train_configs		train_configs
utils		utils
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_DCU.md		README_DCU.md
README_inference.md		README_inference.md
README_train.md		README_train.md
dcu_infer.sh		dcu_infer.sh
dcu_train.sh		dcu_train.sh
gpu_infer.sh		gpu_infer.sh
gpu_infer_long.sh		gpu_infer_long.sh
gpu_train.sh		gpu_train.sh
infer_dcu.sh		infer_dcu.sh
openmm.patch		openmm.patch
requirements.txt		requirements.txt
requirements_dcu.txt		requirements_dcu.txt
run_helixfold.py		run_helixfold.py
setup_env		setup_env
train.py		train.py
train_dcu.sh		train_dcu.sh

README.md

HelixFold: An Efficient and Improved Implementation of AlphaFold 2 using PaddlePaddle

AlphaFold2 is an accurate protein structure prediction pipeline. HelixFold provides an efficient and improved implementation of the complete training and inference pipelines of AlphaFold2 in GPU and DCU. Compared with the computational performance of AlphaFold2 reported in the paper, OpenFold and Uni-Fold implemented through PyTorch, HelixFold reduces the training time from about 11 days originally to 5.12 days, and only 2.89 days when using hybrid parallelism. Training HelixFold from scratch can achieve competitive accuracy with AlphaFold2.

Instruction

The detailed instructions on running HelixFold in GPU and DCU for training and inference are provided in the following links:

Technical Highlights for Efficient Implementation

Branch Parallelism and Hybrid Parallelism HelixFold proposes Branch Parallelism (BP) to split the calculation branch across multiple devices in order to accelerate computation during the initial training phase. The training cost is further reduced by training with Hybrid Parallelism, combining BP with Dynamic Axial Parallelism (DAP) and Data Parallelism (DP).
Operator Fusion and Tensor Fusion to Reduce the Cost of Scheduling Scheduling a huge number of operators is one of the bottlenecks for the training. To reduce the cost of scheduling, Fused Gated Self-Attention is utilized to combine multiple blocks into an operator, and thousands of tensors are fused into only a few tensors.
Multi-dimensional Memory Optimization Multiple techniques, including Recompute, BFloat16, In-place memory, and Subbatch (Chunking), are exploited to reduce the memory required for training and inference. Ultra-long monomer protein (around 6600 AA) prediction is supported now.

Please refer to paper for more technical details.

Online Service

For those who want to try out our model without any installation, we also provide an online interface PaddleHelix HelixFold Forecast through web service.

Copyright

HelixFold code is licensed under the Apache 2.0 License, which is same as AlphaFold. However, we use the AlphaFold parameters pretrained by DeepMind, which are made available for non-commercial use only under the terms of the CC BY-NC 4.0 license.

Reference

[1] Jumper J, Evans R, Pritzel A, et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature 577 (7792), 583–589. 10.1038/s41586-021-03819-2.

[2] Ahdritz,G. et al. (2022). OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv. 10.1101/2022.11.20.517210.

[3] Li, Z., Liu, X., Chen, W., Shen, F., Bi, H., Ke, G., and Zhang, L. (2022). Uni-Fold: An OpenSource Platform for Developing Protein Folding Models beyond AlphaFold. bioRxiv. 10.1101/2022.08.04.502811.

Citation

If you use the code or data in this repos, please cite:

@article{wang2022helixfold,
  title={HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle},
  author={Wang, Guoxia and Fang, Xiaomin and Wu, Zhihua and Liu, Yiqun and Xue, Yang and Xiang, Yingfei and Yu, Dianhai and Wang, Fan and Ma, Yanjun},
  journal={arXiv preprint arXiv:2207.05477},
  year={2022}
}

@article{wang2022efficient_alphafold2,
  title={Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism},
  author={Wang, Guoxia and Wu, Zhihua and Fang, Xiaomin and Xiang, Yingfei and Liu, Yiqun and Yu, Dianhai and Ma, Yanjun},
  journal={arXiv preprint arXiv:2211.00235},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helixfold

helixfold

README.md

HelixFold: An Efficient and Improved Implementation of AlphaFold 2 using PaddlePaddle

Instruction

Technical Highlights for Efficient Implementation

Online Service

Copyright

Reference

Citation

Files

helixfold

Directory actions

More options

Directory actions

More options

Latest commit

History

helixfold

Folders and files

parent directory

README.md

HelixFold: An Efficient and Improved Implementation of AlphaFold 2 using PaddlePaddle

Instruction

Technical Highlights for Efficient Implementation

Online Service

Copyright

Reference

Citation