Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot of edge areas, possibly with corresponding mutation density #101

Open
hyanwong opened this issue Nov 21, 2023 · 0 comments
Open

Plot of edge areas, possibly with corresponding mutation density #101

hyanwong opened this issue Nov 21, 2023 · 0 comments

Comments

@hyanwong
Copy link
Member

It is useful to look at the distribution of edge "areas" (i.e. (parent_time-child_time) * span: note this may not be quite so meaningful in an undated tree sequence, but might still reveal some issues).

In particular, if we have mutations on the tree sequence, we expect the number of mutations to be proportional to edge area. If there are edges in the tree sequence that cover a large area with hardly any mutations, this is likely to indicate QC issues. Such regions, by the way, also cause problems for tsdate (causing it to try to estimate some ancestral nodes in the GEL data as coming into existence only a few seconds ago). We seem to be seeing this with very long recent edges: perhaps due to a lack of singletons?

As a guide, perhaps it would be useful to have a plot where it is obvious if (say) half the singletons were removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant