A question about the tutorial of simplify() #242

fatlonggg · 2023-06-30T12:07:33Z

fatlonggg
Jun 30, 2023

Hi. I have some question about the simplify() method of tree sequence. I 'm a slim user. I need to run a simulation with neutral and deleterious mutations. Since it takes a lot of time, I decided to use tree sequence to avoid the neutral mutations in my simulation, and overlay them to the tree sequence after my simulation finished. When I tried to do that, I noticed this sentence in the slim manual: Here, our chosen goal is to overlay mutations only back to the point of coalescence, and so we call simplify() to strip away all ancestral information above the point of coalescence. (If we wanted to overlay fixed mutations as well, past coalescence back to the start of forward simulation, then we would not call simplify().)
Actually, I do need to overlay the fixed mutations, so I tried to understand the consequence of simplify(), but found it hard to understand. I overlaid neutral mutations to my tree sequence by msprime. I tried to call the "msprime.sim_mutations (...)" with and without the "ts.simplify()" before it. With that sentence in slim manual, I expected the results of the model with the simplify() would not have any fixed mutations, but it seems not true. When I reloaded the tree sequence back to slim, I found there are some fixed mutations. But the counts of fixed mutations between "with simplify()" and "without simplify()" are indeed different. I am confused about it. Did I misunderstand the sentence in the slim manual? Could you explain it for me and show an example in detail?
When I tried to find the tutorial of simplify(), I found that there are maybe a mistake in the "Completing forwards simulations". The last part of this tutorial seems to show the consequence of simplify(), which is the exactly thing I want to know, but the last figure is exactly the same figure as the figure before it. Did I misunderstand it? Or it's indeed a mistake?
By the way, I find it's not easy to access the frequency or count of each mutation. Are there some straight way to do that? Such as a single method or a property of mutation object?

petrelharp · 2023-06-30T16:30:59Z

petrelharp
Jun 30, 2023
Maintainer

Hm, let's see. Good questions. To see what's going on here, consider what we'd have to do to add fixed mutations to a tree sequence: how many should we add? There are an unbounded (infinite?) number of "fixed mutations" in the history of any sample, if you look back far enough in time, so we clearly need to say how far back we want to add those fixed mutations for. Since msprime adds mutations to trees, more specifically to branches of trees, that's how we tell msprime where we want the mutations: a fixed mutation is one that falls on the branch above the root of the tree; so if there is a branch above the root at a site we can get fixed mutations on it. Now, the branch above the root of a tree is redundant, in some sense - we know it's there, and it doesn't affect relationships between samples. The simplify() operation removes everything you don't need to reconstruct the trees, and so removes anything above the root (i.e., above the MRCA of all the samples). This explains that sentence in the SLiM manual.

So : if you first simplify a tree sequence and then use msprime to add mutations, then none of the mutations that msprime added will be fixed. However, you say:

I expected the results of the model with the simplify() would not have any fixed mutations, but it seems not true.

I'm guessing this is because you started with a tree sequence produced by SLiM that already had fixed mutations in it? I think that simplify would not remove these (but I'm not checking right now).

So, I think what you need to do is simple: either (a) don't call simplify at all, or (b) if you do, use the keep_input_roots option.

And, good point about the tutorial - what it says is correct, but in the example provided there's no change. (so, it's a bad example for that point)

frequency or count of each mutation

It's on the list to add this method, but for now you can look here: tskit-dev/tskit#504

Hope that helps?

0 replies

petrelharp · 2023-06-30T16:31:50Z

petrelharp
Jun 30, 2023
Maintainer

Also: probably we should convert this to a discussion and raise the point about the example in a tutorial as a separate issue.

0 replies

fatlonggg · 2023-06-30T18:25:56Z

fatlonggg
Jun 30, 2023
Author

Thanks for your reply.

I'm guessing this is because you started with a tree sequence produced by SLiM that already had fixed mutations in it?

Well, actually, I started my simulation without any mutation, but I designed a bottleneck event in it. Maybe the bottleneck event account for the persence of the fixed mutations?
And from your words, I realized that the simplify() method removes the edges and nodes above the root nodes, and you said that the words in the tutorial is correct, so I guess that the code in there is also correct, and it's actually a good example, since the edges and nodes above the root node should have been removed. I guess the only mistake in there is just the figure was placed there incorrectly? Since my jupyter got something wrong, I can't test the code right now/(ㄒoㄒ)/~~.

0 replies

fatlonggg · 2023-07-01T15:03:17Z

fatlonggg
Jul 1, 2023
Author

Also: probably we should convert this to a discussion and raise the point about the example in a tutorial as a separate issue.

Well, how to do this? Actually, it's my first time to comment on githubO(∩_∩)O

0 replies

fatlonggg · 2023-07-03T07:58:53Z

fatlonggg
Jul 3, 2023
Author

Oh god, I know why there are still some fixed mutations in my results after ran the simplify(). I calculated the mutation frequency in a sub-population which came from a 100/5000 sample rather than the whole population. My bad.

0 replies

petrelharp · 2023-07-03T15:51:15Z

petrelharp
Jul 3, 2023
Maintainer

I had to convert it. Done!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about the tutorial of simplify() #242

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

A question about the tutorial of simplify() #242

fatlonggg Jun 30, 2023

Replies: 6 comments

petrelharp Jun 30, 2023 Maintainer

petrelharp Jun 30, 2023 Maintainer

fatlonggg Jun 30, 2023 Author

fatlonggg Jul 1, 2023 Author

fatlonggg Jul 3, 2023 Author

petrelharp Jul 3, 2023 Maintainer

fatlonggg
Jun 30, 2023

petrelharp
Jun 30, 2023
Maintainer

petrelharp
Jun 30, 2023
Maintainer

fatlonggg
Jun 30, 2023
Author

fatlonggg
Jul 1, 2023
Author

fatlonggg
Jul 3, 2023
Author

petrelharp
Jul 3, 2023
Maintainer