Replies: 1 comment 2 replies
-
Hi @mawassw, welcome! 👋 It's always exciting to see people building cool new stuff on top of tskit, so hopefully we can help. I think there's probably just some confusion here about calibrating branch lengths, sequence lengths and the rates. When I run the branch mode divergence like this print(nts.diversity(mode="branch")) I get 38717732.54502242, which tells me that the average distance between samples along the trees is very large. What are your time-units here? I'm also not clear what the units of sequence length mean here. It looks like each tree is one unit of sequence long - so what's your model of "site density" here then? These tricky scaling problems need to be worked out so that you can provide the appropriate rate to sim_mutations. I wouldn't bother with the InfiniteAlleles model here by the way --- because you are using a continuous genome you aren't going to have multiple mutations at a site. Is there a particular reason for using the text-based encoding rather than the # Created from your files
ts.dump("example.ts")
# Same tree sequence loaded in a fraction of a second
ts = tskit.load("example.ts") |
Beta Was this translation helpful? Give feedback.
-
Hi everyone,
let me start with the main problem: I am trying to calculate diversity-Ne by simulating neutral mutations using
msprime.sim_mutations
along a tree sequence recorded using our own forward time simulator of diploid genomes incorporating linkage blocks and recombination at given hotspots.More details on the tree sequence (I've attached an example of a tree sequence tables that I am using for this purpose for any of you to test with; tskit_tables_example.zip): We use the tskit API integrated into our forward time simulator to build a tree sequence tree for the purpose of keeping track of mutations that occur within the linkage blocks. So we don't have sites, but a linkage block where multiple mutations can occur and the mutations have values which are sampled from DFEs.
We are fairly sure that the output tables and the tree sequences we build from them are correct because we get very good concordance between the fitness flux estimated directly in our simulator and the fitness flux calculated based on the fixed mutations we extract from the tree sequence.
The issue we are having now is that same tree sequence, when used to simulate neutral mutations to calculate diveristy-Ne is giving wild results. We are not sure if the parameterization we are using for
msprime.sim_mutations
is the issue. See the snippet of code below which we use to perform this operation. We are using InfiniteAlleles model since our setup is similar to that of SLiM anddiscrete_genomes=False
since we assumed that neutral mutations can happen anywhere in our genomes even though our linkage blocks are integers (in the example I give we have 1150 linkage blocks = 23 chr * 50 blocks per chr) - changing it toTrue
doesn't change much anyway. The amount of diversity we are getting is very high for a rate of 10^-8. Any insights on what we are going wrong here?Beta Was this translation helpful? Give feedback.
All reactions