R2 Decay Across Distance #1118
-
Hello, I am trying to fit exponential decay functions of r2 across physical distance of my simulated tree sequences. So far I can retrieve the M x M matrix of r2 of simulations using:
However, I haven't found the best way to turn this matrix into a data structure where I can find the physical distance (in bp) between each SNP in each pairwise computation. I can get the M SNP positions with:
but after this I'm not sure what's next? I know I can write a VCF of my tree sequences and then compute LD, but I'm trying to keep this computation without producing many intermediate files. Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Hi @soulcreator3, thanks for your question! To get the pairwise distance between sites in a tree sequence I would do something like: import numpy as np
positions = tree_sequence.tables.sites.position
distance_matrix = np.fabs(np.subtract(positions[:,np.newaxis], positions)) You can use the sites array directly as it is guranteed to be sorted: https://tskit.readthedocs.io/en/latest/data-model.html#sec-site-requirements. |
Beta Was this translation helpful? Give feedback.
-
Here's something I've used for my projects. import numpy as np
import pandas as pd
import msprime
import tskit
## Simulate
tree_sequence=msprime.simulate(sample_size=50, Ne=600,
length=1e7, recombination_rate=1e-6,
mutation_rate=1e-8)
## Get pairwise matrix
ld_calc = tskit.LdCalculator(tree_sequence)
A = ld_calc.r2_matrix()
## Annotate matrix
df=pd.DataFrame(A)
df.index=tree_sequence.tables.sites.position
df.columns=tree_sequence.tables.sites.position
## Turn into long dataframe
long_format_distance_df=df.unstack().reset_index()
long_format_distance_df.columns = ['position_1', 'position_2', 'r_2']
long_format_distance_df["distance"] = np.fabs(long_format_distance_df["position_1"] - long_format_distance_df["position_2"])
long_format_distance_df.head()
## Get coordinates for plot/fitting and plot
x=long_format_distance_df["distance"]
y=long_format_distance_df["r_2"]
plt.scatter(x, y, alpha=0.5) |
Beta Was this translation helpful? Give feedback.
-
Hi @soulcreator3, did either of these answers help you, or are you still looking for a solution? |
Beta Was this translation helpful? Give feedback.
Here's something I've used for my projects.