Sequence encodings and beta heatmaps #77

wsdewitt · 2020-07-27T16:53:34Z

The sequence encoding we are using is one-hot at each site. This is overparameterized because it includes indicator variables for the WT sequence. The encoding we use comes from dms_variants.binarymap.BinaryMap with expand=True.
As a result, the WT sequence is not represented by a sequence of zeros, so single-mutant variants are not represented by a sequence of zeros containing only one 1, as assumed in numpy_single_mutant_predictions .

Currently the beta coefficients are not interpretable as single mutant effects, since there are beta coefficients for the WT states too. My suggestion is to use an encoding that omits the redundant WT indicators, and instead model the WT latent score with a single bias parameter (like previous methods), rather than L weights (where L is sequence length). This will make a few tasks more straightforward:

plotting single mutant effects, and associating them with a single beta parameter
modeling the WT intercept for data given wrt WT
modeling interactions of variants (e.g. pairwise).

The text was updated successfully, but these errors were encountered:

wsdewitt added the invalid This doesn't seem right label Jul 27, 2020

wsdewitt self-assigned this Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence encodings and beta heatmaps #77

Sequence encodings and beta heatmaps #77

wsdewitt commented Jul 27, 2020

Sequence encodings and beta heatmaps #77

Sequence encodings and beta heatmaps #77

Comments

wsdewitt commented Jul 27, 2020