index out of bounds during zero-shot with msa1b #649

Maxwell-downtown · 2024-01-12T06:46:26Z

When running zero-shot variant prediction using msa1b with the codes provided in examples/variant-prediction, I came across the following error:
File "predict.py", line 180, in <lambda> lambda row: label_row( File "predict.py", line 114, in label_row score = token_probs[0, 1 + idx, mt_encoded] - token_probs[0, 1 + idx, wt_encoded] IndexError: index 216 is out of bounds for dimension 1 with size 216
the code I use is as followed:
python predict.py --model-location esm_msa1b_t12_100M_UR50S --sequence MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv --mutation-col mutant --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv --offset-idx 1 --scoring-strategy masked-marginals --msa-path ./data/MSA/trial_BLAT.a2m
I use the entire BLAT_ECOLX sequences of 286aa as the input sequence, and all the entries in my .a2m file are of the same length. I also set the -offset-idx to 1, but it doesn't seem to work. I print out the dimension of the batch_tokens and the token_probs in predict.py and find the size which I think represents the length of the protein sequence is 216 while it should be 286 in this case.
Other proteins of different length were also tested, but the dimensions never match. Am i understanding the dimensions of the token_probs wrong?
Besides, running the demonstration codes under examples/variant-prediction with data provided in this directory results in error
RuntimeError: Received unaligned sequences for input to MSA, all sequence lengths must be equal.
code:
python predict.py \ --model-location esm_msa1b_t12_100M_UR50S \ --sequence HPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW \ --dms-input ./data/BLAT_ECOLX_Ranganathan2015.csv \ --mutation-col mutant \ --dms-output ./data/BLAT_ECOLX_Ranganathan2015_labeled.csv \ --offset-idx 24 \ --scoring-strategy masked-marginals \ --msa-path ./data/BLAT_ECOLX_1_b0.5.a3m

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index out of bounds during zero-shot with msa1b #649

index out of bounds during zero-shot with msa1b #649

Maxwell-downtown commented Jan 12, 2024

index out of bounds during zero-shot with msa1b #649

index out of bounds during zero-shot with msa1b #649

Comments

Maxwell-downtown commented Jan 12, 2024