You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sequence encoding we are using is one-hot at each site. This is overparameterized because it includes indicator variables for the WT sequence. The encoding we use comes from dms_variants.binarymap.BinaryMap with expand=True.
As a result, the WT sequence is not represented by a sequence of zeros, so single-mutant variants are not represented by a sequence of zeros containing only one 1, as assumed in numpy_single_mutant_predictions .
Currently the beta coefficients are not interpretable as single mutant effects, since there are beta coefficients for the WT states too. My suggestion is to use an encoding that omits the redundant WT indicators, and instead model the WT latent score with a single bias parameter (like previous methods), rather than L weights (where L is sequence length). This will make a few tasks more straightforward:
plotting single mutant effects, and associating them with a single beta parameter
modeling the WT intercept for data given wrt WT
modeling interactions of variants (e.g. pairwise).
The text was updated successfully, but these errors were encountered:
The sequence encoding we are using is one-hot at each site. This is overparameterized because it includes indicator variables for the WT sequence. The encoding we use comes from
dms_variants.binarymap.BinaryMap
withexpand=True
.As a result, the WT sequence is not represented by a sequence of zeros, so single-mutant variants are not represented by a sequence of zeros containing only one 1, as assumed in
numpy_single_mutant_predictions
.Currently the beta coefficients are not interpretable as single mutant effects, since there are beta coefficients for the WT states too. My suggestion is to use an encoding that omits the redundant WT indicators, and instead model the WT latent score with a single bias parameter (like previous methods), rather than L weights (where L is sequence length). This will make a few tasks more straightforward:
The text was updated successfully, but these errors were encountered: