-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to fine tune locally with SaProtHub datasets? #72
Comments
Hi, You indeed could load models and datasets from huggingface. But maybe it is more suitable for you to use the ColabSaprot interface while fine-tuning model with your local GPU. It's very simple as you only have to follow several steps from https://github.com/westlake-repl/SaprotHub/tree/main/local_server. |
Thanks for the quick response. I managed to correctly connect to the local runtime. Still, I think there must be something that might not be suited for my setup. In order to request the resources I need to train SaProt, I need to My ideal case would be to directly get the Thanks |
Sure. We provide some functions to generate the LMDB file, either from a dictionary or from a file. You could check
Then you could call this function and generate a LMDB file. Note that you have to generate 3 |
Thanks a lot. It worked great, and I managed to fine-tune some models :) one last thing before closing: from the examples I noticed that all are related to the mutation effect of a single substitution on a protein. I want to do 2 things
I followed #51, and use the expression Many thanks again for all the help. |
Hi, Yes. If you input combinatorial mutations to the model, the predicted score is the sum of all single point mutations scores. Here we refer to the original paper ESM-1v that assumes the score is additive when multiple mutations exist. Please note that SaProt tends to perform well for variants that have fewer mutation sites. Predicting mutational effect for multiple mutations is harder as the fitness landscape becomes more and more complicated and there might exist some epistasis. |
Thank you. Makes sense, specially the part about epistasis. How about point 2: predicting fitness (or any other downstream task) for multiple protein sequences? Is this possible? Thank you |
Yes, I think the point 2 is possible, as shown by the downstream task of thermostability. You just need to prepare some samples with labels of interest and use them to fine-tune Saprot. Then you could use the model to make inference and rank the candidates. |
And that is where I get confused. I already have the fine tuned model and I can do
to predict the mutation effect of one (or more) substitutions. But what I would like to do is something more like
and will end up with one prediction per sequence that I would later use to rank. Thanks again |
Yes. If you already have the fine-tuned model you can just follow the second way to predict the fitness. In fact the first way to predict mutational effect doesn't require additional training. It is in a zero-shot manner so you only need to load the original weight of Saprot and you can do the prediction (fine-tuned models cannot do this). Now you have the fine-tuned model you can directly use it to make prediction. So your question is how you can load your model and do something in the second way? |
Yes, more like how to pass the whole sequence and get a prediction from the fine tuned, or zero shot model. Calling
|
We don't provide a function call to directly get outputs from a fine-tuned model. However, you could easily load your model and manually get it. Here we use a regression model as an example:
The |
Hi everyone,
I’m fairly new with huggingface, and I was wondering if it is possible to locally fine tune SaProt with the SaProtHub datasets, and how to call the models from there as well, rather than using Colab, since I’m getting runtime disconnected and constantly need to restart the process.
As an example I would like to give it a try to fine tune a model locally, similarly as it is done with the example for Thermostability, but with this dataset.
Also, how can I use locally the models from SaProtHub, like this? I understand the weights are not there, so probably is just more a documentation of the performance and config.
Many thanks!
The text was updated successfully, but these errors were encountered: