-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic goals #1
Comments
I'm pretty new to the whole area, but am starting to think about this a bit. Agreed that taking all pairwise TMRCA's and doing all sorts of distributional / summary statistic comparisons is a good idea. I've also been reading a bit about machine learning approaches to tree-like structures. That's gotten me into parse trees for natural language processing, which has its set of metrics: https://tech.grammarly.com/blog/the-dirty-little-secret-of-constituency-parser-evaluation. Here are two writeups I found on the genetics side. The TREESPACE package (R) may be worth some study. |
In tskit-dev/tsdate#310, @nspope and @petrelharp have developed a nice way of finding "equivalent" nodes between tree sequences, which could be useful in this repo. It would be an efficient alternative to comparing all pairwise tMRCAs, and less biased by polytomies (see tskit-dev/tsdate#301 (comment)) |
This issue is to outline and discuss the basic goals and initial outlines of the repository. Pinging @hyanwong, @awohns, @leospeidel, @brianzhang01 and @pierpal for input. Please respond in this thread with any thoughts.
Goal
The goal of this repo is to create standardised ways of comparing two (or more) tree sequences. These can be both simple metrics and also standardised plots using matplotlib and/or seaborn. Various aspects of the tree sequences such as topological distances under tree metrics, overall coalescence time distributions, etc should be considered. Basically, we want to have an easy to use and robust toolkit that will have all of the useful ways of comparing tree sequences in one place.
Initial functionality: truth-to-estimate comparisons
Repo structure
The repository should be structured as an installable Python package, which we will distribute via PyPI and conda-forge. As such, dependencies should be kept to a minimum (and certainly be packages that are easily installed via pip/conda). We should consider Jupyter notebooks as a first-class user of the module, so that quick analyses of tree sequences can be done in notebooks in a user-friendly way.
The text was updated successfully, but these errors were encountered: