Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify logic for dealing with ploidy #27

Open
shz9 opened this issue Jun 30, 2020 · 3 comments
Open

Unify logic for dealing with ploidy #27

shz9 opened this issue Jun 30, 2020 · 3 comments

Comments

@shz9
Copy link
Collaborator

shz9 commented Jun 30, 2020

The codebase currently has 2 strategies for dealing with ploidy in Pedigrees:

[1] Specify ploidy at the stage of path-sampling, which involves:
[1.1] Assigning number of specified "ploids" to each individual.
[1.2] Sampling a path through those ploids.
[1.3] Adding a flag to the traversal object so we know its ploidy.

Advantages:

  • In principle, it can accommodate any ploidy level (1, 2, 4, ...).
  • It has data structures (e.g. Haplotype) to assign values and attributes to individual ploids (e.g. IBD segment).
  • Seamlessly integrates with the functionality from Pedigree class.

Disadvantages:

  • More messy to deal with in the Aligner object. We will need to check that tree sequence to pedigree node assignment doesn't happen more than ploidy level.

/// Alternative to this formulation ///

As an alternative, we can add an attribute to the Pedigree class self.diploid_graph, which we can initialize in path sampling. The only difficulty with this alternative is how to access the Genealogical helper methods, such as .predecessors(), .successors() and .iter_nodes().


[2] Create a DiploidGraph class that inherits from Pedigree
[2.1] Takes an initialized Pedigree and converts it into a DiploidGraph.

Advantages:

  • Simpler and more elegant way to carry out the alignment for diploids.

Disadvantages:

  • Accommodates only haploids/diploids.
  • Because it inherits from Pedigree class, some of the methods may break for the diploid or not make sense in this context.

/// Alternative to this formulation ///

Do not inherit from Pedigree. Just set the Pedigree to be an attribute of the DiploidGraph object. This would mean duplicating the .sample_path() method, but it should be fine.

Any thoughts?

@shz9
Copy link
Collaborator Author

shz9 commented Jun 30, 2020

For the alternative formulation of the 2nd solution, maybe we can make DiploidGraph inherit from Genealogical instead of Pedigree? Not sure if it would break anything there.

@ivan-krukov
Copy link
Owner

I don't think that inheriting from Geneological should break anything. At least it shouldn't, in principle

@shz9
Copy link
Collaborator Author

shz9 commented Jul 16, 2020

OK. I updated the interfaces for handling haplotype structures.

I re-named the DiploidGraph class to HaplotypeGraph and implemented interfaces for generating graphs for:

  • Autosomal segments.
  • X Chromosome segments.
  • Y Chromosome segments.
  • Mitochondrial segments.

I also moved path sampling to be entirely within the confines of the HaplotypeGraph. The new sample_path method provides an option called haploid_probands, which gives you the option to sample 1 haplotype per proband and then sampling the paths from those. As we discussed, this is a more biologically meaningful way of representing the "haploid" graph.

Note that now Aligner objects take a HaplotypeGraph and a Traversal object as arguments, instead of Pedigree and Traversal as it was implemented previously.

Sorry these changes are a bit too much and may have broken other scripts/test that you've written. But I think this formulation is a lot clearer and will hopefully be useful in the tasks ahead.

As always, examples are in the notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants