Updated unified genealogies #419

hyanwong · 2024-07-23T20:09:00Z

hyanwong
Jul 23, 2024
Maintainer

At some point in the next 6 months or so, we want to release updated unified genealogies, if possible. For this we can use the better quality 1000 genomes dataset that has been released, and not have to do the liftover ourselves (although SGDP will still need lifting over)

However (a) it may take some time to run the tsinfer commands for the new datasets and (b) we are still unsure of the best mismatch ratio to use.

People seem to be using this as a source of raw data (rather than just for the genealogies), so I suggest that we follow the current unified genealogies and include all the sites (but simply mark the low quality ones as not-for-inference). We might want to check on singletons too (and maybe release a version with rephrased and non-rephrased singletons).

A good QC check for the inference would be to use the expected Ne-over-time patterns inferred by Phlash from the same dataset (see Fig 7 at https://www.biorxiv.org/content/10.1101/2024.03.25.586640v1)

nspope · 2024-07-23T20:28:16Z

nspope
Jul 23, 2024
Maintainer

This would be great-- are you thinking of including the ancient samples, as well?

4 replies

hyanwong Jul 23, 2024
Maintainer Author

If we can get tsdate working for that use-case, then yes, definitely.

nspope Jul 24, 2024
Maintainer

Seems like a good reason to get it working-- the machinery is all in place. There's a two-stage process for this, I guess? Infer+date with only contemporary samples, then reinfer with ancient samples + sites time?

hyanwong Jul 24, 2024
Maintainer Author

I think the initial tsinfer step could include the ancient samples if there are a lot of them (and if that could therefore help get the topology right). But for the unified genealogies, there are so few that maybe it's easier to do an initial inference without them.

The pathway that uses "ancestors.insert_proxy_samples" as described in https://tskit.dev/tsdate/docs/latest/historical_samples.html#the-2-step-approach is probably still the easiest approach, I guess. I think it should still work.

hyanwong Jul 24, 2024
Maintainer Author

Note that the way of injecting sites_time will change soon in tsinfer with the deprecation of the old input format (we are intending to release 0.4-alpha this week). See tskit-dev/tsinfer#923 (comment)

hyanwong · 2024-07-24T07:04:47Z

hyanwong
Jul 24, 2024
Maintainer Author

I was just thinking - if we have the site order correct in tsinfer, then we potentially could incorporate ancient samples in a single step. We could infer treating the the ancient samples as contemporary with the moderns, and hope that they weren't added as close relatives. Then we could constrain the tsdate ancient sample nodes to fixed times and run EP (I think?). This would be susceptible to error, though, and we might force some bad constraints on the ancient placements. Perhaps these could be identified during EP, however.

Does this even make sense @nspope ?

2 replies

nspope Jul 24, 2024
Maintainer

It makes sense, but I'm honestly not sure what will work well-- I think we'll have to try different things out. That said, having ancient samples at fixed times is a pretty strong constraint-- and will get more so, the more samples there are. So I suspect the best way to this is as two stage process.

hyanwong Jul 24, 2024
Maintainer Author

Yes, I worry about the strong constraint. Although it might be that the 2-stage process also imposes a similar strong constraint (possibly in the ancestor reconstruction part, though)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated unified genealogies #419

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Updated unified genealogies #419

hyanwong Jul 23, 2024 Maintainer

Replies: 2 comments · 6 replies

nspope Jul 23, 2024 Maintainer

hyanwong Jul 23, 2024 Maintainer Author

nspope Jul 24, 2024 Maintainer

hyanwong Jul 24, 2024 Maintainer Author

hyanwong Jul 24, 2024 Maintainer Author

hyanwong Jul 24, 2024 Maintainer Author

nspope Jul 24, 2024 Maintainer

hyanwong Jul 24, 2024 Maintainer Author

hyanwong
Jul 23, 2024
Maintainer

Replies: 2 comments 6 replies

nspope
Jul 23, 2024
Maintainer

hyanwong Jul 23, 2024
Maintainer Author

nspope Jul 24, 2024
Maintainer

hyanwong Jul 24, 2024
Maintainer Author

hyanwong Jul 24, 2024
Maintainer Author

hyanwong
Jul 24, 2024
Maintainer Author

nspope Jul 24, 2024
Maintainer

hyanwong Jul 24, 2024
Maintainer Author