Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update global prior via EM, use mixture of gammas #307

Closed
wants to merge 4 commits into from

Conversation

nspope
Copy link
Contributor

@nspope nspope commented Aug 8, 2023

  • The default prior used by the variational algorithm is a "global" (same prior for every node) gamma distribution, which is estimated a priori using conditional coalescent moments.
  • A better option is to have a global prior that is a mixture of gamma distributions. This should be better at capturing heavy tails / heterogeneity across the sequence.
  • Even better, this mixture prior can be updated via EM at the end of each EP iteration. This is sort of an interesting case of EM, where the observations are not scalars but are distributions. This means the "prior" doesn't need to make any a priori assumptions regarding population size, demographic history, conditional coalescent model, etc. -- and can be initialized to more-or-less arbitrary values. So, it's more like fitting a mixed effects model model rather than using a prior in the Bayesian sense.
  • It turns out the EM algorithm has closed form updates (== has very little computational cost per EM update) and converges within a small number of iterations (<= 10).
  • This should be possible to extend in various interesting ways (have mixture weights indexed by coordinate along the sequence, or use a mixture structure that is informed by the number of descendants -- like an "optimizeable" conditional coalescent prior)

With a two component mixture, this performs noticeably better than the current default prior (simulated 10 Mb, 100 haploids):

@nspope
Copy link
Contributor Author

nspope commented Aug 8, 2023

One thing I'm not quite sure about is how to interface this with custom, node-specific priors. In general, it seems like node specific priors should be used to fit the mixture components via EM, but should not be subsequently updated as part of the EP algorithm (e.g. priors should be 'fixed' for these nodes). This would allow one to put custom constraints on the ages of certain nodes without interfering with the general prior optimization strategy.

@nspope
Copy link
Contributor Author

nspope commented Aug 8, 2023

edit used the wrong mutation rate, updated figures look a lot better

Optimizing the "prior" also seems to work reasonably well for data simulated from a non-equillibrium demographic history, without any a priori knowledge of that history. Here's an example with Bos taurus (extreme population decline) from stdpopsim. From top to bottom: global fixed prior assuming constant demography; global fixed prior assuming true population history; global EM-updated prior.

so the EM-updated prior does just about as well as if one knew the true demography a priori.

@hyanwong
Copy link
Member

hyanwong commented Aug 9, 2023

How well does this work with tree sequences inferred from tsinfer, by the way?

@hyanwong
Copy link
Member

hyanwong commented Jan 5, 2024

I assume this can be closed in favour of #349?

@nspope
Copy link
Contributor Author

nspope commented Jan 5, 2024

yup this is superceded! Closing.

@nspope nspope closed this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants