population identity in VCF #1472
-
Hi All, I have a troubleshooting question regarding finding the population identities of individuals in a simulation in msprime. I've produced a VCF based off of a demographic simulation of 2 populations coming into contact, and would like to know each tsk_n's population value (0,1). Thanks to Peter R., I've had a head start on this, however I've hit a road block where all individuals appear to belong to the same population, regardless of what the demographic debugger suggests is actually occurring. Does anyone know what may be going on? I don't believe it's a result of a bug in the functions, as this is reproducible even when picking random individuals by hand to see their population assignment. Thanks, Code: import numpy as np
import msprime
#write vcf with given pre-determined scenario
m12=3.5
m21=0.5
time_m=1000
t_divergence=5000
population_configurations=[
msprime.PopulationConfiguration(sample_size = 50, initial_size=10000),
msprime.PopulationConfiguration(sample_size = 50, initial_size=10000)]
demographic_events=[
msprime.MigrationRateChange(0, rate=m21, matrix_index=(0, 1)),
msprime.MigrationRateChange(0, rate=m12, matrix_index=(1, 0)),
msprime.MigrationRateChange(time_m, rate= 0),
msprime.MassMigration(t_divergence, source=1, dest=0, proportion=1)]
ts = msprime.simulate(Ne=1000, length=1e4, recombination_rate=2e-8, mutation_rate=2e-8,
population_configurations=population_configurations,
demographic_events=demographic_events)
with open("output.vcf", "w") as vcf_file:
ts.write_vcf(vcf_file, 2)
#write which populations tsk_n belong to in VCF
#all even numbers within n sample range (sample_size*2) as node 0 & node 1 = ind 1, etc.
n = [0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,
36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,
68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98]
#sample_size of actual individuals (tsk_{m})
m = list(range(0,50))
#asserts all nodes{n} and nodes{n+1} have same population value
def assert_node(n):
assert ts.node(n).population == ts.node(n+1).population
#gets tsk_{m} population assignment
def population_node(m):
return print(f"Individual tsk_{m} is in population {ts.node(m).population}")
#gives list of tsk_{m} population assignment... for some reason all are 0
pop_list = [population_node(t) for t in m]
dp = msprime.DemographyDebugger(
Ne=1000,
population_configurations=population_configurations,
migration_matrix=migration_matrix,
demographic_events=demographic_events)
dp.print_history() |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Hi @k-lamb! Thanks for your question. It seems that half your sample nodes are from each population as expected: >>> pops = node.population for node in ts.nodes() if node.is_sample()
>>> list(pops)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>> collections.Counter(pops)
Counter({0: 50, 1: 50}) What is happening is that for your VCF you have specified a So to get your code to print what you expected you need: def population_node(m):
return print(f"Individual tsk_{m} is in population {ts.node(m*2).population}") |
Beta Was this translation helpful? Give feedback.
-
Did this answer your question @k-lamb? It's probably worthwhile updating your simulation code to use the msprime 1.0 API, as this makes tracking individuals and populations etc quite a bit easier. See here for a starting point. |
Beta Was this translation helpful? Give feedback.
-
@jeromekelleher yes, thank you! However, I did find that once the VCF had been created it would not load properly into DaDi... It always produces a blank SFS. I haven't been quite sure how to approach that issue. |
Beta Was this translation helpful? Give feedback.
Hi @k-lamb! Thanks for your question.
It seems that half your sample nodes are from each population as expected:
What is happening is that for your VCF you have specified a
ploidy
of 2 so (as you say in your comment) VCF sample 0 is a combination of tree s…