-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tutorial for new VariantData format #945
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #945 +/- ##
=======================================
Coverage 93.23% 93.23%
=======================================
Files 18 18
Lines 6299 6299
Branches 1139 1139
=======================================
Hits 5873 5873
Misses 290 290
Partials 136 136
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gone through the details, but I think starting from a simple VCF example is much more useful that the original alignment. Everyone knows what VCF looks like now.
319619b
to
09865fb
Compare
Using existing data set is a good call. The tooling isn't there yet, I don't think there's much point in trying to force it through for this PR. |
Hmm, is that true? I don't really know what a VCF looks like (and I don't really want to learn about it either). At least for teaching, it's clearer to me to start with a set of alignments (i.e. haplotype data), not site-by-site data, which as a biologist, I always find slightly more confusing to deal with. For example, when storing viral genomes, I think of the sequence of the virus, not the variation at each site. Surely that's more of a natural thing to think about, biologically? Anyway, the SGkit stuff is removed now, and I think it all still reads fine. Maybe we can merge (once it's building properly) and iterate from there? It would be useful for teaching to have examples which work using the new input format. |
No python tabix writer available. This can be removed when we have a tskit2zarr utility.
34819ff
to
18b6cca
Compare
6c0ac75
to
1342e5d
Compare
Hopefully will force a re-run
Finally got this to build: that was a bit of a marathon. I think we can merge this, and also close #865 which I had to roll into this PR to get it to build properly. |
P.s. happy to squash all these commits together, if that helps. |
This allows us to reserve the word "tutorial" for more specific inference tutorials, for example, on the tutorials site. It's also more accurate: people are more likely to just straight to e.g. the VCF usage section rather than work their way through the whole page.
First pass at changing the tutorial to match the new VariantData format. Based off #944
I haven't sorted out the population specification in the sparrows data example yet