Infer one tree seq for multiple chromosomes for forward simulation with SLiM #855
Replies: 1 comment
-
Hi Tati, That's an interesting use of In this case, you could simply concatenate the chromosomes end-to-end, and run tsinfer on the concatenated data (and you will probably want to keep the flanking regions too (i.e. set With "real" inferences, for efficiency we normally infer each chromosome arm separately. You could, I suppose, do something like this (perhaps inferring each chromosome, rather than each arm, separately), and then paste them all together into one large tree sequence, with sample nodes shared, but all internal nodes with different IDs. I don't think there is any code written to do this explicitly (but see some relevant discussion at tskit-dev/msprime#848 (comment)), although perhaps it might be useful for other people. If you figure out a routine for this (I can give you some pointers here, if you need them) you could post it as a "show-and-tell" on https://github.com/tskit-dev/tskit/discussions? A more sophisticated option would be to infer separate tree sequences for each chromosome and then try to identify shared nodes between the tree sequences, but that's more of a long-term research project. |
Beta Was this translation helpful? Give feedback.
-
Hi!
I'm reaching out to make sure my workflow is correct. My goal is to simulate a complex trait involving loci on multiple chromosomes using real genetic data from a population VCF file.
To avoid burdening SLiM with tracking neutral mutations, I first infer the tree sequence from the VCF file and then remove all neutral mutations. After the forward simulation in SLiM, I reintegrate these neutral mutations into my tree sequence.
The reason why I am not inferring one tree seq per chromosome is just because SLiM wouldn't allow me to import each of them to reconstruct my full genome.
However, I'm still unsure if it is acceptable to infer one tree sequence from a VCF file with multiple chromosomes. I've noticed strategies in msprime that employ a msprime.RateMap to enhance tree inferences' accuracy. Should I also incorporate a msprime.RateMap in my tsinfer inference? Are shared nodes in between my chromosomes a problem? Since I'm more focused on utilizing the tree sequence as a data structure to effectively store and overlay mutations, rather than the precision of the tree itself, I'm inclined to believe that I don't need to worry about this aspect. Nevertheless, to be safe I wanted to ask.
When importing the .tree file into SLiM, I ensure separate chromosome treatment by implementing the recombination rate=0.5 trick between them. During the simulation, I avoid simplifications to retain all nodes (to accurately overlay mutationd afterward).
Thank you!
Tati
Beta Was this translation helpful? Give feedback.
All reactions