Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

how to input cropped protein for ESM-2 ? #651

Open
GriffithLin opened this issue Jan 13, 2024 · 1 comment
Open

how to input cropped protein for ESM-2 ? #651

GriffithLin opened this issue Jan 13, 2024 · 1 comment

Comments

@GriffithLin
Copy link

GriffithLin commented Jan 13, 2024

Hi !
I have problem when I use ESM-2 to embedding long protein sequence. For a long protein sequence, it needs to be cropped to a sequence with a length less than 1024, and BOS and EOS tokens are used to signal the beginning and end of a real protein.
My question is how to input a sequence that contains only a BOS or an EOS, or none of them?
Thanks in advance.

@amgcasueshavoc
Copy link

You do not always need BOS and EOS tokens, even if you don’t have a transformer decoder. However, if you are fine-tuning ESM-2 for a specific downstream task, where you intend to use BOS and EOS tokens, then you would include them as special tokens.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants