Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tutorial for new VariantData format #945

Merged
merged 8 commits into from
Jul 27, 2024

Conversation

hyanwong
Copy link
Member

@hyanwong hyanwong commented Jul 26, 2024

First pass at changing the tutorial to match the new VariantData format. Based off #944

I haven't sorted out the population specification in the sparrows data example yet

Copy link

codecov bot commented Jul 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.23%. Comparing base (738bd13) to head (2d842d4).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #945   +/-   ##
=======================================
  Coverage   93.23%   93.23%           
=======================================
  Files          18       18           
  Lines        6299     6299           
  Branches     1139     1139           
=======================================
  Hits         5873     5873           
  Misses        290      290           
  Partials      136      136           
Flag Coverage Δ
C 93.23% <ø> (ø)
python 95.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through the details, but I think starting from a simple VCF example is much more useful that the original alignment. Everyone knows what VCF looks like now.

docs/tutorial.md Outdated Show resolved Hide resolved
@hyanwong hyanwong force-pushed the variant-data branch 4 times, most recently from 319619b to 09865fb Compare July 26, 2024 15:33
@hyanwong hyanwong changed the title Variant data Update tutorial for new VariantData format Jul 26, 2024
@hyanwong hyanwong marked this pull request as ready for review July 26, 2024 16:01
@jeromekelleher
Copy link
Member

Using existing data set is a good call. The tooling isn't there yet, I don't think there's much point in trying to force it through for this PR.

@hyanwong
Copy link
Member Author

I haven't gone through the details, but I think starting from a simple VCF example is much more useful that the original alignment. Everyone knows what VCF looks like now.

Hmm, is that true? I don't really know what a VCF looks like (and I don't really want to learn about it either). At least for teaching, it's clearer to me to start with a set of alignments (i.e. haplotype data), not site-by-site data, which as a biologist, I always find slightly more confusing to deal with. For example, when storing viral genomes, I think of the sequence of the virus, not the variation at each site. Surely that's more of a natural thing to think about, biologically?

Anyway, the SGkit stuff is removed now, and I think it all still reads fine. Maybe we can merge (once it's building properly) and iterate from there? It would be useful for teaching to have examples which work using the new input format.

No python tabix writer available. This can be removed when we have a tskit2zarr utility.
@hyanwong hyanwong force-pushed the variant-data branch 12 times, most recently from 34819ff to 18b6cca Compare July 26, 2024 19:15
@hyanwong hyanwong force-pushed the variant-data branch 4 times, most recently from 6c0ac75 to 1342e5d Compare July 26, 2024 19:43
Hopefully will force a re-run
@hyanwong
Copy link
Member Author

Finally got this to build: that was a bit of a marathon. I think we can merge this, and also close #865 which I had to roll into this PR to get it to build properly.

@hyanwong
Copy link
Member Author

P.s. happy to squash all these commits together, if that helps.

This allows us to reserve the word "tutorial" for more specific inference tutorials, for example, on the tutorials site. It's also more accurate: people are more likely to just straight to e.g. the VCF usage section rather than work their way through the whole page.
@mergify mergify bot merged commit e84f7c0 into tskit-dev:main Jul 27, 2024
14 checks passed
@benjeffery benjeffery mentioned this pull request Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants