An experimental analysis of graph representation learning for Gene Ontology based protein function prediction
This is a collection of articles about graph representation learning for Gene Ontology based protein function prediction.
Protein function annotation is one of the most fundamental research topics in bioinformatics. Understanding functions of proteins is not only crucial for biological systems, but can also enhances in other aspects, such as drug discovery, disease therapies, agriculture or manufacturing. However, current manual protein annotation performed by experts is costly and time-consuming, which could not keep up with the huge number of newly proteins generated from high-throughput sequencing techniques. Specifically, there are more than 249 million unreviewed proteins in the UniProtKB database, only around 570 thousand sequences are manually annotated until April 2024. Thus, the development of accurate and effective computational predictors for protein function prediction (PFP) is imperative to bridge this gap.
We created a preliminary list of papers using Google Scholar and PubMed. The search queries were combinations of the following keywords: "protein function prediction"
, "protein function annotation"
, "Gene Ontology"
, "graph neural network"
, "graph representation learning"
, "graph embedding"
, and "deep learning"
. To focus on recent studies, we filtered out results published before 2019.
Initially, articles were selected based on their titles and abstracts. Subsequently, full-text papers were assessed to determine whether they qualify as a significant method for Gene Ontology based protein function prediction using graph embedding. Additionally, relevant papers cited in the selected ones underwent the same process to ensure a comprehensive literature review.
There are various methods for protein function prediction based on graph representation learning. We categorized these methods into four groups: PPI network, Protein Structure, GO graph, Integrated graphs.
PPI network embedding only
deepNF: deep network fusion for protein function prediction
Gligorijević, V., Barot, M., & Bonneau, R.
[Bioinformatics, 2018]
[Paper]
MGEGFP: a multi-view graph embedding method for gene function prediction based on adaptive estimation with GCN
Li, W., Zhang, H., Li, M., Han, M., & Yin, Y.
[Briefings in Bioinformatics, 2022]
[Paper]
DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction
You, R., Yao, S., Mamitsuka, H., & Zhu, S.
[Bioinformatics, 2021]
[Paper]
Integrating PPI embeddings with heterogeneous data sources
DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier
Kulmanov, M., Khan, M. A., & Hoehndorf, R.
[Bioinformatics, 2018]
[Paper]
DeepFunc: a deep learning framework for accurate prediction of protein functions from protein sequences and interactions
Zhang, F., Song, H., Zeng, M., Li, Y., Kurgan, L., & Li, M.
[Proteomics, 2019]
[Paper]
SDN2GO: an integrated deep learning model for protein function prediction
Cai, Y., Wang, J., & Deng, L.
[Frontiers in Bioengineering and Biotechnology, 2020]
[Paper]
A deep learning framework for gene ontology annotations with sequence-and network-based information
Zhang, F., Song, H., Zeng, M., Wu, F. X., Li, Y., Pan, Y., & Li, M.
[IEEE/ACM transactions on computational biology and bioinformatics, 2020]
[Paper]
Prot2GO: Predicting GO annotations from protein sequences and interactions
Zhang, X., Wang, L., Liu, H., Zhang, X., Liu, B., Wang, Y., & Li, J.
[IEEE/ACM transactions on computational biology and bioinformatics, 2021]
[Paper]
MultiPredGO: deep multi-modal protein function prediction by amalgamating protein structure, sequence, and interaction information
Giri, S. J., Dutta, P., Halani, P., & Saha, S.
[IEEE Journal of Biomedical and Health Informatics, 2020]
[Paper]
GONET: A Deep Network to Annotate Proteins via Recurrent Convolution Networks
Li, J., Wang, L., Zhang, X., Liu, B., & Wang, Y.
[IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020]
[Paper]
DeepFusionGO: Protein function prediction by fusing heterogeneous features through deep learning
Huang, Z., Zheng, R., & Deng, L.
[IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2022]
[Paper]
MSF-PFP: A Novel Multisource Feature Fusion Model for Protein Function Prediction
Li, X., Qian, Y., Hu, Y., Chen, J., Yue, H., & Deng, L.
[Journal of Chemical Information and Modeling, 2024]
[Paper]
Experimental protein structures
Structure-based protein function prediction using graph convolutional networks
Gligorijević, V., Renfrew, P. D., Kosciolek, T., Leman, J. K., Berenberg, D., Vatanen, T., ... & Bonneau, R.
[Nature communications, 2021]
[Paper]
PersGNN: applying topological data analysis and geometric deep learning to structure-based protein function prediction
Swenson, N., Krishnapriyan, A. S., Buluc, A., Morozov, D., & Yelick, K.
[Paper]
Predicted protein structures
Accurate protein function prediction via graph attention networks with predicted structure information
Lai, B., & Xu, J.
[Briefings in Bioinformatics, 2022]
[Paper]
Struct2GO: protein function prediction based on graph pooling algorithm and AlphaFold2 structure information
Jiao, P., Wang, B., Wang, X., Liu, B., Wang, Y., & Li, J.
[Bioinformatics, 2023]
[Paper]
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
Boadu, F., Cao, H., & Cheng, J.
[Bioinformatics, 2023]
[Paper]
GPSFun: geometry-aware protein sequence function predictions with language models
Yuan, Q., Tian, C., Song, Y., Ou, P., Zhu, M., Zhao, H., & Yang, Y.
[Nucleic Acids Research, 2024]
[Paper]
Combined protein structures
Hierarchical graph transformer with contrastive learning for protein function prediction
Gu, Z., Luo, X., Chen, J., Deng, M., & Lai, L.
[Bioinformatics, 2023]
[Paper]
General GO term embedding
DeepGOA: Predicting Gene Ontology Annotations of Proteins via Graph Convolutional Network
Zhou, G., Wang, J., Zhang, X., & Yu, G.
[IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2019]
[Paper]
Predicting functions of maize proteins using graph convolutional network
Zhou, G., Wang, J., Zhang, X., Guo, M., & Yu, G.
[BMC bioinformatics, 2020]
[Paper]
TALE: Transformer-based protein function Annotation with joint sequence--Label Embedding
Cao, Y., & Shen, Y.
[Bioinformatics, 2021]
[Paper]
An effective GCN-based hierarchical multi-label classification for protein function prediction
Choi, K., Lee, Y., Kim, C., & Yoon, M.
[Paper]
GCL-GO: A novel sequence-based hierarchy-aware method for protein function prediction
Choi, K., Lee, Y., & Kim, C.
[IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2022]
[Paper]
PANDA: protein function prediction using domain architecture and affinity propagation
Wang, Z., Zhao, C., Wang, Y., Sun, Z., & Wang, N.
[Scientific reports, 2018]
[Paper]
PANDA2: protein function prediction using graph neural networks
Zhao, C., Liu, T., & Wang, Z.
[NAR Genomics and Bioinformatics, 2022]
[Paper]
PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships
Pan, T., Li, C., Bi, Y., Wang, Z., Gasser, R. B., Purcell, A. W., ... & Song, J.
[Bioinformatics, 2023]
[Paper]
Protein function prediction with functional and topological knowledge of gene ontology
Zhao, Y., Yang, Z., Hong, Y., Yang, Y., Wang, L., Zhang, Y., ... & Wang, J.
[IEEE Transactions on NanoBioscience, 2023]
[Paper]
DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for Automatic Protein Function Prediction
Li, Z., Jiang, C., & Li, J.
[22nd International Workshop on Data Mining in Bioinformatics.]
[Paper]
Partial order relation--based gene ontology embedding improves protein function prediction
Li, W., Wang, B., Dai, J., Kou, Y., Chen, X., Pan, Y., ... & Xu, Z. Z.
[Briefings in Bioinformatics, 2024]
[Paper]
PPI and other networks
Graph2GO: a multi-modal attributed network embedding method for inferring protein functions
Fan, K., Guan, Y., & Zhang, Y.
[GigaScience, 2020]
[Paper]
A deep learning framework for predicting protein functions with co-occurrence of GO terms
Li, M., Shi, W., Zhang, F., Zeng, M., & Li, Y.
[IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022]
[Paper]
Protein function prediction using graph neural network with multi-type biological knowledge
Shuai, Y., Wang, W., Li, Y., Zeng, M., & Li, M.
[IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023]
[Paper]
Heteregenerous networks
PSPGO: Cross-species heterogeneous network propagation for protein function prediction
Wu, K., Wang, L., Liu, B., Liu, Y., Wang, Y., & Li, J.
[IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022]
[Paper]
HNetGO: protein function prediction via heterogeneous network transformer
Zhang, X., Guo, H., Zhang, F., Wang, X., Wu, K., Qiu, S., ... & Li, J.
[Briefings in Bioinformatics, 2023]
[Paper]
OntoProtein: Protein Pretraining With Gene Ontology Embedding
Zhang, N., Bi, Z., Liang, X., Cheng, S., Hong, H., Deng, S., ... & Chen, H.
[ICLR, 2022]
[Paper]
Integrating Heterogeneous Biological Networks and Ontologies for Improved Protein Function Prediction with Graph Neural Networks
Tran, N. C., & Gao, J. X.
[IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023]
[Paper]
Predicting Protein Functions Based on Heterogeneous Graph Attention Technique
Zhao, Y., Yang, Z., Wang, L., Zhang, Y., Lin, H., & Wang, J.
[IEEE Journal of Biomedical and Health Informatics, 2024]
[Paper]
Protein structure and GO graph
TALE-cmap: Protein function prediction based on a TALE-based architecture and the structure information from contact map
Qiu, X. Y., Wu, H., & Shao, J.
[Computers in Biology and Medicine, 2022]
[Paper]
SLPFA: Protein Structure-Label Embedding Attention Network for Protein Function Annotation
Zhang, Q., Liu, J., Yang, F., Yang, Z., & Feng, J.
[IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023]
[Paper]
GNNGO3D: Protein function prediction based on 3d structure and functional hierarchy learning
Zhang, L., Jiang, Y., & Yang, Y.
[IEEE Transactions on Knowledge and Data Engineering, 2023]
[Paper]
POLAT: Protein function prediction based on soft mask graph network and residue-Label ATtention
Liu, Y., Zhang, Y., Chen, Z., & Peng, J.
[Computational Biology and Chemistry, 2024]
[Paper]