Replies: 2 comments 6 replies
-
This would be great-- are you thinking of including the ancient samples, as well? |
Beta Was this translation helpful? Give feedback.
-
I was just thinking - if we have the site order correct in tsinfer, then we potentially could incorporate ancient samples in a single step. We could infer treating the the ancient samples as contemporary with the moderns, and hope that they weren't added as close relatives. Then we could constrain the tsdate ancient sample nodes to fixed times and run EP (I think?). This would be susceptible to error, though, and we might force some bad constraints on the ancient placements. Perhaps these could be identified during EP, however. Does this even make sense @nspope ? |
Beta Was this translation helpful? Give feedback.
-
At some point in the next 6 months or so, we want to release updated unified genealogies, if possible. For this we can use the better quality 1000 genomes dataset that has been released, and not have to do the liftover ourselves (although SGDP will still need lifting over)
However (a) it may take some time to run the
tsinfer
commands for the new datasets and (b) we are still unsure of the best mismatch ratio to use.People seem to be using this as a source of raw data (rather than just for the genealogies), so I suggest that we follow the current unified genealogies and include all the sites (but simply mark the low quality ones as not-for-inference). We might want to check on singletons too (and maybe release a version with rephrased and non-rephrased singletons).
A good QC check for the inference would be to use the expected Ne-over-time patterns inferred by Phlash from the same dataset (see Fig 7 at https://www.biorxiv.org/content/10.1101/2024.03.25.586640v1)
Beta Was this translation helpful? Give feedback.
All reactions