Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add harmonization step to Aligner objects #16

Open
shz9 opened this issue May 31, 2020 · 6 comments
Open

Add harmonization step to Aligner objects #16

shz9 opened this issue May 31, 2020 · 6 comments
Assignees

Comments

@shz9
Copy link
Collaborator

shz9 commented May 31, 2020

Aligner objects as currently implemented only map nodes in the tree sequence to nodes in the pedigree. This can be done either iteratively or in one step. One thing that's missing is "harmonization" or "sanity checking" for greedy algorithms. The goal of the harmonization step is to make sure that the mappings make sense and are consistent with other information that we have, e.g.:

  • If an internal node n in the tree sequence is matched to a founder node in the pedigree, then it's likely that all or most of the predecessors of n are out-of-pedigree nodes.
  • For an aligned pair (n_ts, n_ped), make sure that their successors and predecessors preserve their time-ordering. For example, no predecessor of n_ts should map to a successor of n_ped and vice versa.

After these sanity checks are implemented and used to correct the mapping (if possible), then one final thing that can be added is the pedigree node to tree-sequence edge mapping:

  • Given the assignment of tree sequence nodes to pedigree nodes, we need to find the most likely mapping of remaining pedigree nodes to tree sequence edges. This isn't straightforward, but as a first step we can do it using some simple heuristics.
@shz9
Copy link
Collaborator Author

shz9 commented Jun 9, 2020

Other cases to check in the harmonization step:

  • In the diploid case, each individual in the pedigree is assigned to only 2 different nodes in the tree sequence. In the haploid case, it is 1.
  • In the diploid case, if an individual in the pedigree is assigned to 2 nodes in the tree sequence, then those nodes cannot be ancestral to each other (they have to be on separate branches of the tree).

@shz9
Copy link
Collaborator Author

shz9 commented Jun 16, 2020

I added basic code for the harmonization step. Implemented two checks:

[1] If a node in the tree sequence n_ts is mapped to a founder node in the pedigree n_ped, then set all of its ancestors to be out-of-pedigree nodes (i.e. their ancestors map to None).
[2] If 2 sibling nodes in the tree sequence are mapped to disconnected nodes in the pedigree (i.e. nodes with no common ancestor in the pedigree), then set all of their ancestors to be out-of-pedigree nodes.

These 2 checks improved the accuracy of the greedy algorithms by 5-12 points. Now the accuracy for the best configuration is hovering around 90%.

@kobrica will take it from here and implement the rest of the checks.

@andjelatodorovic
Copy link
Collaborator

I have implemented all of the checks, but they need to be tested along with @shz9 functions

@shz9
Copy link
Collaborator Author

shz9 commented Jun 26, 2020

@kobrica Sorry, I haven't had a lot of time to look at your implementation in detail. But it seems there are a couple of syntax errors:

UnboundLocalError: local variable 'ts_n_succ' referenced before assignment

Can you please test your code in a jupyter notebook for now? You can use the template in Experiments.ipynb.

@andjelatodorovic
Copy link
Collaborator

@shz9 I will try to test everything out in jupyter notebook, I think I know what the error is here.

@shz9
Copy link
Collaborator Author

shz9 commented Jun 30, 2020

I re-arranged the code a bit. It doesn't work quite well for the diploid case.
We can discuss the remaining steps soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants