Basic goals #1

jeromekelleher · 2019-03-29T12:21:26Z

This issue is to outline and discuss the basic goals and initial outlines of the repository. Pinging @hyanwong, @awohns, @leospeidel, @brianzhang01 and @pierpal for input. Please respond in this thread with any thoughts.

Goal

The goal of this repo is to create standardised ways of comparing two (or more) tree sequences. These can be both simple metrics and also standardised plots using matplotlib and/or seaborn. Various aspects of the tree sequences such as topological distances under tree metrics, overall coalescence time distributions, etc should be considered. Basically, we want to have an easy to use and robust toolkit that will have all of the useful ways of comparing tree sequences in one place.

Initial functionality: truth-to-estimate comparisons

Compute the weighted KC-distance along the chromosome (as used in the tsinfer paper).
Compute pairwise TMRCA heatmap (like fig 2c,d in the Relate paper)

Repo structure

The repository should be structured as an installable Python package, which we will distribute via PyPI and conda-forge. As such, dependencies should be kept to a minimum (and certainly be packages that are easily installed via pip/conda). We should consider Jupyter notebooks as a first-class user of the module, so that quick analyses of tree sequences can be done in notebooks in a user-friendly way.

brianzhang01 · 2019-05-23T18:05:33Z

I'm pretty new to the whole area, but am starting to think about this a bit. Agreed that taking all pairwise TMRCA's and doing all sorts of distributional / summary statistic comparisons is a good idea.

I've also been reading a bit about machine learning approaches to tree-like structures. That's gotten me into parse trees for natural language processing, which has its set of metrics: https://tech.grammarly.com/blog/the-dirty-little-secret-of-constituency-parser-evaluation.

Here are two writeups I found on the genetics side. The TREESPACE package (R) may be worth some study.
https://cran.r-project.org/web/packages/Quartet/vignettes/Tree-distance-metrics.pdf
https://onlinelibrary.wiley.com/doi/full/10.1111/1755-0998.12676

hyanwong · 2023-08-24T08:11:32Z

In tskit-dev/tsdate#310, @nspope and @petrelharp have developed a nice way of finding "equivalent" nodes between tree sequences, which could be useful in this repo. It would be an efficient alternative to comparing all pairwise tMRCAs, and less biased by polytomies (see tskit-dev/tsdate#301 (comment))

hyanwong mentioned this issue Aug 24, 2023

Method for matching node sets across tree sequences tskit-dev/tsdate#310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic goals #1

Basic goals #1

jeromekelleher commented Mar 29, 2019

brianzhang01 commented May 23, 2019

hyanwong commented Aug 24, 2023 •

edited

Loading

Basic goals #1

Basic goals #1

Comments

jeromekelleher commented Mar 29, 2019

Goal

Initial functionality: truth-to-estimate comparisons

Repo structure

brianzhang01 commented May 23, 2019

hyanwong commented Aug 24, 2023 • edited Loading

hyanwong commented Aug 24, 2023 •

edited

Loading