You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It'd be useful to return the (approximate) marginal likelihood from the inside pass (that is, the likelihood of the mutations in the tree sequence, integrating over possible node ages). The reason is that with a constant mutation rate, this marginal likelihood should be quite sensitive to the choice of effective population size and time grid. For example, I simulated ten tree sequences from the standard coalescent with population size 26000; and calculated marginal likelihood with tsdate's algorithm across choices of Ne for the prior:
In this figure, relative absolute error is calculated as abs(estimated age - true age) / true age excluding nodes that are less than 100 generations old; because at mu=1e-8 there's not much information with which to estimate these precisely (& they'll dominate the relative error otherwise). The vertical lines are maxima/minima for logLik/RAE for each sim. For this particular case it seems like choosing Ne on the basis of marginal likelihood also does a pretty good job at minimizing the relative absolute error.
I ran a separate set of simulations where I varied population size between 5000 and 50000, to see how the "inferred" Ne (in terms of maximizing log marginal likelihood) performs as a surrogate for the "optimal" Ne (in terms of minimizing relative absolute error):
There's a downward bias in "optimal" Ne vs the truth regardless of criterion (I think because the current prior discretization scheme biases the expectation of the prior). However, it looks like choosing the prior Ne based on marginal likelihood is a good proxy for minimizing RAE.
So, this may provide a useful heuristic for "calibrating" the prior in an iterative fashion ala #237, especially when combined with #248 which gives more flexibility for modifying the prior timescale. In particular, if we change the underflow protection scheme so that:
Poisson likelihoods aren't standardized over timepoints at each iteration, and
"Standardization" at each step is done by dividing by the sum, not the max (e.g. use logsumexp when working in logspace)
then the (approximate) marginal likelihood can be calculated directly from the denominator member of the inside-outside algorithm class, because geometric scaling appropriately divvies up the contributions of shared descendants across roots.
The text was updated successfully, but these errors were encountered:
It'd be useful to return the (approximate) marginal likelihood from the inside pass (that is, the likelihood of the mutations in the tree sequence, integrating over possible node ages). The reason is that with a constant mutation rate, this marginal likelihood should be quite sensitive to the choice of effective population size and time grid. For example, I simulated ten tree sequences from the standard coalescent with population size 26000; and calculated marginal likelihood with tsdate's algorithm across choices of Ne for the prior:
In this figure, relative absolute error is calculated as
abs(estimated age - true age) / true age
excluding nodes that are less than 100 generations old; because at mu=1e-8 there's not much information with which to estimate these precisely (& they'll dominate the relative error otherwise). The vertical lines are maxima/minima for logLik/RAE for each sim. For this particular case it seems like choosing Ne on the basis of marginal likelihood also does a pretty good job at minimizing the relative absolute error.I ran a separate set of simulations where I varied population size between 5000 and 50000, to see how the "inferred" Ne (in terms of maximizing log marginal likelihood) performs as a surrogate for the "optimal" Ne (in terms of minimizing relative absolute error):
There's a downward bias in "optimal" Ne vs the truth regardless of criterion (I think because the current prior discretization scheme biases the expectation of the prior). However, it looks like choosing the prior Ne based on marginal likelihood is a good proxy for minimizing RAE.
So, this may provide a useful heuristic for "calibrating" the prior in an iterative fashion ala #237, especially when combined with #248 which gives more flexibility for modifying the prior timescale. In particular, if we change the underflow protection scheme so that:
then the (approximate) marginal likelihood can be calculated directly from the
denominator
member of the inside-outside algorithm class, because geometric scaling appropriately divvies up the contributions of shared descendants across roots.The text was updated successfully, but these errors were encountered: