Update tutorial for new VariantData format #945

hyanwong · 2024-07-26T11:34:45Z

First pass at changing the tutorial to match the new VariantData format. Based off #944

~~I haven't sorted out the population specification in the sparrows data example yet~~

codecov · 2024-07-26T11:50:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.23%. Comparing base (738bd13) to head (2d842d4).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #945   +/-   ##
=======================================
  Coverage   93.23%   93.23%           
=======================================
  Files          18       18           
  Lines        6299     6299           
  Branches     1139     1139           
=======================================
  Hits         5873     5873           
  Misses        290      290           
  Partials      136      136

Flag	Coverage Δ
C	`93.23% <ø> (ø)`
python	`95.65% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jeromekelleher

I haven't gone through the details, but I think starting from a simple VCF example is much more useful that the original alignment. Everyone knows what VCF looks like now.

docs/tutorial.md

jeromekelleher · 2024-07-26T16:07:59Z

Using existing data set is a good call. The tooling isn't there yet, I don't think there's much point in trying to force it through for this PR.

hyanwong · 2024-07-26T16:10:48Z

I haven't gone through the details, but I think starting from a simple VCF example is much more useful that the original alignment. Everyone knows what VCF looks like now.

Hmm, is that true? I don't really know what a VCF looks like (and I don't really want to learn about it either). At least for teaching, it's clearer to me to start with a set of alignments (i.e. haplotype data), not site-by-site data, which as a biologist, I always find slightly more confusing to deal with. For example, when storing viral genomes, I think of the sequence of the virus, not the variation at each site. Surely that's more of a natural thing to think about, biologically?

Anyway, the SGkit stuff is removed now, and I think it all still reads fine. Maybe we can merge (once it's building properly) and iterate from there? It would be useful for teaching to have examples which work using the new input format.

No python tabix writer available. This can be removed when we have a tskit2zarr utility.

Hopefully will force a re-run

hyanwong · 2024-07-26T20:16:41Z

Finally got this to build: that was a bit of a marathon. I think we can merge this, and also close #865 which I had to roll into this PR to get it to build properly.

hyanwong · 2024-07-26T20:20:02Z

P.s. happy to squash all these commits together, if that helps.

This allows us to reserve the word "tutorial" for more specific inference tutorials, for example, on the tutorials site. It's also more accurate: people are more likely to just straight to e.g. the VCF usage section rather than work their way through the whole page.

hyanwong force-pushed the variant-data branch from 0e9e0aa to 77852b3 Compare July 26, 2024 12:37

jeromekelleher reviewed Jul 26, 2024

View reviewed changes

docs/tutorial.md Outdated Show resolved Hide resolved

hyanwong force-pushed the variant-data branch 4 times, most recently from 319619b to 09865fb Compare July 26, 2024 15:33

hyanwong added 2 commits July 26, 2024 16:56

Update tutorial.md

6ace68f

Remove sgkit from the tutorial

bc5d77c

hyanwong force-pushed the variant-data branch from 09865fb to bc5d77c Compare July 26, 2024 15:57

hyanwong changed the title ~~Variant data~~ Update tutorial for new VariantData format Jul 26, 2024

Use compatible version of Bio

79676c1

hyanwong marked this pull request as ready for review July 26, 2024 16:01

Add tabix to doc build action

e05adea

No python tabix writer available. This can be removed when we have a tskit2zarr utility.

hyanwong force-pushed the variant-data branch from c2517d5 to e05adea Compare July 26, 2024 16:14

Tidy up the tutorial language

f1d1cb7

hyanwong force-pushed the variant-data branch 12 times, most recently from 34819ff to 18b6cca Compare July 26, 2024 19:15

hyanwong force-pushed the variant-data branch 4 times, most recently from 6c0ac75 to 1342e5d Compare July 26, 2024 19:43

Install tabix via apt-get, and use the notebook python for install

a498cad

hyanwong force-pushed the variant-data branch from 1342e5d to a498cad Compare July 26, 2024 19:52

Roll in PR 865

9530b53

Hopefully will force a re-run

hyanwong force-pushed the variant-data branch from 7c9873f to 2d842d4 Compare July 26, 2024 21:57

benjeffery approved these changes Jul 27, 2024

View reviewed changes

benjeffery added the AUTOMERGE-REQUESTED label Jul 27, 2024

mergify bot merged commit e84f7c0 into tskit-dev:main Jul 27, 2024
14 checks passed

mergify bot removed the AUTOMERGE-REQUESTED label Jul 27, 2024

benjeffery mentioned this pull request Jul 27, 2024

Update docs build #865

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update tutorial for new VariantData format #945

Update tutorial for new VariantData format #945

hyanwong commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 26, 2024 •

edited

Loading

jeromekelleher left a comment

jeromekelleher commented Jul 26, 2024

hyanwong commented Jul 26, 2024

hyanwong commented Jul 26, 2024

hyanwong commented Jul 26, 2024

Update tutorial for new VariantData format #945

Update tutorial for new VariantData format #945

Conversation

hyanwong commented Jul 26, 2024 • edited Loading

codecov bot commented Jul 26, 2024 • edited Loading

Codecov Report

jeromekelleher left a comment

Choose a reason for hiding this comment

jeromekelleher commented Jul 26, 2024

hyanwong commented Jul 26, 2024

hyanwong commented Jul 26, 2024

hyanwong commented Jul 26, 2024

hyanwong commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 26, 2024 •

edited

Loading