Confusion about ploidy and coalescent time scale #1761
Replies: 7 comments 6 replies
-
I'm not sure what's going on here @grahamgower, I would have thought this was equivalent also. This is weird, when I run the three examples with equal seeds I would have thought we'd get identical random trajectories but we don't: ts1 = sim_legacy(pc_c, de_c)
ts2 = sim_ploidy1(demog_c)
ts3 = sim_ploidy2(demog_c)
print(ts1.tables.nodes)
print(ts2.tables.nodes)
print(ts3.tables.nodes)
1 and 3 are identical, as expected, but not 2 (as in, it should be off by a factor of 2 everywhere, I'd have thought). |
Beta Was this translation helpful? Give feedback.
-
This is the only place that I guess we could compare against msprime 0.x to see what it produces in the strong bottleneck case? |
Beta Was this translation helpful? Give feedback.
-
The ploidy is affecting the time scale only indirectly, via the coalescent rate. At |
Beta Was this translation helpful? Give feedback.
-
Shouldn't you be scaling the time of the bottleneck with ploidy? |
Beta Was this translation helpful? Give feedback.
-
Yes, thanks @petrelharp. It's obvious in hindsight, but caused me quite a lot of head scratching! |
Beta Was this translation helpful? Give feedback.
-
But, if you doubled all the population sizes when you set ploidy=1, then you should get the same thing? |
Beta Was this translation helpful? Give feedback.
-
I see that the |
Beta Was this translation helpful? Give feedback.
-
The ploidy section in the docs implies that the difference between simulating with
ploidy=1
andploidy=2
is just the coalescent time scale (plus the sampling). Indeed, one might imagine that to obtain the behaviour of the legacymsprime.simulate()
function, one can just set ploidy=1 and then multiply the time scale. At least for some models, this isn't sufficient. And the effect might be quite subtle. Below, I've compared TMRCA distributions for the legacysimulate()
function, a time-adjustedsim_ancestry(ploidy=1)
and the appropriate incantationsim_ancestry(ploidy=2, samples=[msprime.SampleSet(..., ploidy=1), ...])
.I guess the differences for my "strong bottleneck" model below is driven by the rate of coalescence being different with the diffferent ploidy values? But the ploidy docs don't mention coalescent rates, so I certainly missed this (probably basic) point. Or maybe there's something else going on that I've missed? It might also be prudent to have an "equivalent to msprime.simulate()" example for sim_ancestry() somewhere?
Beta Was this translation helpful? Give feedback.
All reactions