Return marginal likelihood from inside pass #249

nspope · 2023-01-27T00:32:08Z

It'd be useful to return the (approximate) marginal likelihood from the inside pass (that is, the likelihood of the mutations in the tree sequence, integrating over possible node ages). The reason is that with a constant mutation rate, this marginal likelihood should be quite sensitive to the choice of effective population size and time grid. For example, I simulated ten tree sequences from the standard coalescent with population size 26000; and calculated marginal likelihood with tsdate's algorithm across choices of Ne for the prior:

In this figure, relative absolute error is calculated as abs(estimated age - true age) / true age excluding nodes that are less than 100 generations old; because at mu=1e-8 there's not much information with which to estimate these precisely (& they'll dominate the relative error otherwise). The vertical lines are maxima/minima for logLik/RAE for each sim. For this particular case it seems like choosing Ne on the basis of marginal likelihood also does a pretty good job at minimizing the relative absolute error.

I ran a separate set of simulations where I varied population size between 5000 and 50000, to see how the "inferred" Ne (in terms of maximizing log marginal likelihood) performs as a surrogate for the "optimal" Ne (in terms of minimizing relative absolute error):

There's a downward bias in "optimal" Ne vs the truth regardless of criterion (I think because the current prior discretization scheme biases the expectation of the prior). However, it looks like choosing the prior Ne based on marginal likelihood is a good proxy for minimizing RAE.

So, this may provide a useful heuristic for "calibrating" the prior in an iterative fashion ala #237, especially when combined with #248 which gives more flexibility for modifying the prior timescale. In particular, if we change the underflow protection scheme so that:

Poisson likelihoods aren't standardized over timepoints at each iteration, and
"Standardization" at each step is done by dividing by the sum, not the max (e.g. use logsumexp when working in logspace)

then the (approximate) marginal likelihood can be calculated directly from the denominator member of the inside-outside algorithm class, because geometric scaling appropriately divvies up the contributions of shared descendants across roots.

The text was updated successfully, but these errors were encountered:

hyanwong · 2023-12-11T11:28:50Z

Can we close this now @nspope, or is there useful discussion here re choosing the prior Ne? I suspect this has been superseded by #349 ?

nspope · 2023-12-11T21:01:10Z

Yes! Closing.

nspope mentioned this issue Jan 30, 2023

Return marginal likelihood from inside algorithm #251

Merged

nspope closed this as completed Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return marginal likelihood from inside pass #249

Return marginal likelihood from inside pass #249

nspope commented Jan 27, 2023 •

edited

Loading

hyanwong commented Dec 11, 2023

nspope commented Dec 11, 2023

Return marginal likelihood from inside pass #249

Return marginal likelihood from inside pass #249

Comments

nspope commented Jan 27, 2023 • edited Loading

hyanwong commented Dec 11, 2023

nspope commented Dec 11, 2023

nspope commented Jan 27, 2023 •

edited

Loading