Ancient samples that are directly ancestral to other samples #2260
-
Hi all, I'm wondering if it is possible and (if so) there is a preferred way to run a simulation in which we sample ancient individuals that are direct ancestors to other samples that lived more recently. [I am pretty sure this has been discussed before, perhaps on Slack, but I didn't find anything in Issues or Discussions here, so I thought this could be a good place to ask in a searchable location.] A simple example would be to sample the entire population at time zero as well as the entire population in the previous generation. For this to work, we would need to use a discrete simulation model like the DTWF, instead of the continuous Hudson model. Essentially, can I "census" a population at a given time, and have those censused individuals also be ancestral to later samples? Note that this is different from a import msprime
demog = msprime.Demography.isolated_model([2]) # single population of size 2 diploids
samples = [
msprime.SampleSet(2), # sample whole population at time zero
msprime.SampleSet(2, time=1), # whole population one generation ago
]
ts = msprime.sim_ancestry(
random_seed=1, samples=samples, demography=demog, model="dtwf"
)
print(ts.draw_text()) This should print out something like this:
What would be desired for the censused population at time 1 is for the branches from nodes 0 to 3 to go to nodes 4 to 7 (which are also sample nodes). As it is, the population size at time 1 is larger than the specified size in the model, and we end up with cousins between generations 0 and 1, but no parent-offspring relationships among our samples. Is this something that is possible to do (either within DTWF, or another option like fixed pedigree simulation)? [I think this question is somewhat related to Discussion #2188, but not quite the same.] |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Edit: Looks like I should check out the I've been playing around with this a bit. The goal below is to sample the entire population at time zero and at some time in the past. The solution I came up with is a bit hack-y, so I'd be interested if anyone has a better idea here. I know this is a long comment, but I tried to be complete with the code and my thought process here. We'll set up a single population of size 2 diploids, and set the ancient sample time at import msprime
import numpy as np
seed = 3
pop_size = 2
demog = msprime.Demography.isolated_model([pop_size])
sample_time = 3
L = 2 A naive DTWF simulation will not result in direct ancestry from any ancient samples to those at time zero: samples = [
msprime.SampleSet(pop_size), # sample whole population at time zero
msprime.SampleSet(pop_size, time=sample_time), # whole population at sample time
]
ts = msprime.sim_ancestry(
random_seed=seed,
samples=samples,
demography=demog,
model="dtwf",
sequence_length=L,
recombination_rate=0.1,
)
print(ts.draw_text()) Note the exceeded population size at the sample time:
Let's instead try to stop our simulation at the sample time, and then "fill in" the ancient sampled population before finishing the simulation: samples = [
msprime.SampleSet(pop_size), # sample whole population at time zero
]
ts = msprime.sim_ancestry(
random_seed=seed,
samples=samples,
demography=demog,
model="dtwf",
end_time=sample_time,
sequence_length=L,
recombination_rate=0.1,
)
print(ts.draw_text()) Here's our halted simulated tree sequence:
Now we can try to add the rest of the samples into the population at the sample time. These added nodes won't be ancestral to our samples at time 0, but we still want them in order to census the full population at the ancient sample time: # get the node table columns
tables = ts.tables
times = tables.nodes.time
flags = tables.nodes.flags
pops = tables.nodes.population
indivs = tables.nodes.individual
# set flags to 1 for nodes at time of sample time
flags[times == sample_time] = 1
nodes_to_add = 2 * pop_size - sum(times == sample_time)
times = np.concatenate((times, [sample_time] * nodes_to_add)).astype(times.dtype)
flags = np.concatenate((flags, [1] * nodes_to_add)).astype(flags.dtype)
pops = np.concatenate((pops, [0] * nodes_to_add)).astype(pops.dtype)
indivs = np.concatenate((indivs, [-1] * nodes_to_add)).astype(pops.dtype)
# now assign those ancient sample nodes to diploid individuals
diploids = np.random.permutation(np.where(times == sample_time)[0]).reshape(pop_size, 2)
for dip in diploids:
indivs[dip] = max(indivs) + 1
tables.individuals.add_row()
# reset node table columns and sort
tables.nodes.set_columns(flags=flags, time=times, population=pops, individual=indivs)
tables.sort()
ts = tables.tree_sequence()
# finish the simulation
ts = msprime.sim_ancestry(
random_seed=seed,
initial_state=ts,
model="dtwf",
demography=demog,
start_time=sample_time,
recombination_rate=0.1,
)
print(ts.draw_text()) This isn't quite right, because one of the trees had fully coalesced before reaching the sample time, but we're closer:
We need those edges in the coalesced marginal trees to reach up to the sampling time. The simplest work-around I came up with is to add a "dummy" node above the sampling time. This prevents any marginal tree from ever fully coalescing before hitting the sampling time, and then we can remove that node and continue the simulation to completion: samples = [
msprime.SampleSet(pop_size), # sample whole population at time zero
msprime.SampleSet(1, ploidy=1, time=sample_time + 1), # our "dummy" node
]
ts = msprime.sim_ancestry(
random_seed=seed,
samples=samples,
demography=demog,
model="dtwf",
end_time=sample_time,
sequence_length=L,
recombination_rate=0.1,
)
print("Running DTWF up to the sample time:")
print(ts.draw_text())
# get the node table columns
tables = ts.tables
times = tables.nodes.time
flags = tables.nodes.flags
pops = tables.nodes.population
indivs = tables.nodes.individual
# remove node above sample time
to_keep = times <= sample_time
to_del = np.where(to_keep == False)[0]
times = times.compress(to_keep)
flags = flags.compress(to_keep)
pops = pops.compress(to_keep)
indivs = indivs.compress(to_keep)
# set flags to 1 for nodes at time of sample time
flags[times == sample_time] = 1
nodes_to_add = 2 * pop_size - sum(times == sample_time)
times = np.concatenate((times, [sample_time] * nodes_to_add)).astype(times.dtype)
flags = np.concatenate((flags, [1] * nodes_to_add)).astype(flags.dtype)
pops = np.concatenate((pops, [0] * nodes_to_add)).astype(pops.dtype)
indivs = np.concatenate((indivs, [-1] * nodes_to_add)).astype(pops.dtype)
# now assign those ancient sample nodes to diploid individuals
tables.individuals.clear()
for i in range(sum(np.unique(indivs) >= 0)):
tables.individuals.add_row()
diploids = np.random.permutation(np.where(times == sample_time)[0]).reshape(pop_size, 2)
for dip in diploids:
indivs[dip] = max(indivs) + 1
tables.individuals.add_row()
# reset edge table by shifting node indexes as required (since we removed a node)
p = tables.edges.parent
c = tables.edges.child
p[p > to_del] -= 1
c[c > to_del] -= 1
tables.edges.set_columns(
left=tables.edges.left, right=tables.edges.right, parent=p, child=c
)
# reset node table columns and sort
tables.nodes.set_columns(flags=flags, time=times, population=pops, individual=indivs)
tables.sort()
ts = tables.tree_sequence()
print("Tree sequence with added samples and removed 'dummy' node:")
print(ts.draw_text())
# finish the simulation
ts = msprime.sim_ancestry(
random_seed=seed,
initial_state=ts,
model="dtwf",
demography=demog,
start_time=sample_time,
recombination_rate=0.1,
)
print("Finishing the simulation with DTWF:")
print(ts.draw_text()) I think this leaves us with what we want:
|
Beta Was this translation helpful? Give feedback.
-
I think you could do this more simply with a
gets
|
Beta Was this translation helpful? Give feedback.
I think you could do this more simply with a
union
? Something like: