You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.
When running zero-shot variant prediction using msa1b with the codes provided in examples/variant-prediction, I came across the following error: File "predict.py", line 180, in <lambda> lambda row: label_row( File "predict.py", line 114, in label_row score = token_probs[0, 1 + idx, mt_encoded] - token_probs[0, 1 + idx, wt_encoded] IndexError: index 216 is out of bounds for dimension 1 with size 216
the code I use is as followed: python predict.py --model-location esm_msa1b_t12_100M_UR50S --sequence MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv --mutation-col mutant --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv --offset-idx 1 --scoring-strategy masked-marginals --msa-path ./data/MSA/trial_BLAT.a2m
I use the entire BLAT_ECOLX sequences of 286aa as the input sequence, and all the entries in my .a2m file are of the same length. I also set the -offset-idx to 1, but it doesn't seem to work. I print out the dimension of the batch_tokens and the token_probs in predict.py and find the size which I think represents the length of the protein sequence is 216 while it should be 286 in this case.
Other proteins of different length were also tested, but the dimensions never match. Am i understanding the dimensions of the token_probs wrong?
Besides, running the demonstration codes under examples/variant-prediction with data provided in this directory results in error RuntimeError: Received unaligned sequences for input to MSA, all sequence lengths must be equal.
code: python predict.py \ --model-location esm_msa1b_t12_100M_UR50S \ --sequence HPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW \ --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv \ --mutation-col mutant \ --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv \ --offset-idx 24 \ --scoring-strategy masked-marginals \ --msa-path ./data/BLAT_ECOLX_1_b0.5.a3m
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When running zero-shot variant prediction using msa1b with the codes provided in examples/variant-prediction, I came across the following error:
File "predict.py", line 180, in <lambda> lambda row: label_row( File "predict.py", line 114, in label_row score = token_probs[0, 1 + idx, mt_encoded] - token_probs[0, 1 + idx, wt_encoded] IndexError: index 216 is out of bounds for dimension 1 with size 216
the code I use is as followed:
python predict.py --model-location esm_msa1b_t12_100M_UR50S --sequence MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv --mutation-col mutant --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv --offset-idx 1 --scoring-strategy masked-marginals --msa-path ./data/MSA/trial_BLAT.a2m
I use the entire BLAT_ECOLX sequences of 286aa as the input sequence, and all the entries in my .a2m file are of the same length. I also set the -offset-idx to 1, but it doesn't seem to work. I print out the dimension of the batch_tokens and the token_probs in predict.py and find the size which I think represents the length of the protein sequence is 216 while it should be 286 in this case.
Other proteins of different length were also tested, but the dimensions never match. Am i understanding the dimensions of the token_probs wrong?
Besides, running the demonstration codes under examples/variant-prediction with data provided in this directory results in error
RuntimeError: Received unaligned sequences for input to MSA, all sequence lengths must be equal.
code:
python predict.py \ --model-location esm_msa1b_t12_100M_UR50S \ --sequence HPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW \ --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv \ --mutation-col mutant \ --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv \ --offset-idx 24 \ --scoring-strategy masked-marginals \ --msa-path ./data/BLAT_ECOLX_1_b0.5.a3m
The text was updated successfully, but these errors were encountered: