Structural Residue Variant Training on Single Protein in a Tetramer Complex #590
Replies: 3 comments
-
Hi @lloydtripp , thanks again for reaching out. To answer your questions
I find it hard to answer this tbh, because it's your experiment that you do as you think is best :)
The code is setup such that it should work in theory, but as (mentioned above) it might be too slow to work in practice.
@gcroci2 is in a better place to answer this, so will leave it to her (also curious to hear whether you agree with what I said above).
Thanks, very glad to hear it 😊 |
Beta Was this translation helpful? Give feedback.
-
I agree with @DaniBodor here. Usually, I would recommend against using the whole protein complex (you can still try though), as in my experience the information gets lost in the noise. Am I correct in understanding that you have 5k data points for single amino acid substitutions in a single chain (~250 resi) for a tetramer? I suppose except for the substituted site much of your graph would be identical, it is possible that your model may overfit. |
Beta Was this translation helpful? Give feedback.
-
Nice to read our first DeepRank2 discussion 🎉🎉 Thanks @lloydtripp for your comments and questions! I completely agree with @DaniBodor and @rgayatri, and I don't have much to add to their comments. |
Beta Was this translation helpful? Give feedback.
-
I’m a graduate student at Washington University in St. Louis doing some machine learning for variant effect predictions. I found the DeepRank software to be the best at capturing protein structure as a graph representation for machine learning regression (predicting a continuous variable). This was something I was hoping to implement in my case.
I wanted to make sure my experimental setup made sense with your software. I used Rosettafold2 to predict structures for all possible single amino acid substitutions in a single chain that is in place of a multimer (3 other constant chains). These structures would be fed into DR2 to predict on functional data (deep mutational scan of protein function for each protein variant). It seems like the software is setup to only capture parts of a protein (small peptide region and nearby atoms). Ideally, I’m taking in the whole structure and having it learn the structure to function relationship. The end goal is to learn what about the structure is important.
My main questions:
influence_radius
andmax_edge_length
sizing in this case? I assume as large as possible but I could be wrong.influence_radius
andmax_edge_length
during theQueryCollection
to capture the graph representation of the entire protein (and complex)?Thanks again for making very useful software! The documentation and code was all put together very professionally!
Beta Was this translation helpful? Give feedback.
All reactions