-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check all alleles at a site are unique before inference #927
Comments
Can you show the zarr alleles fields as well please (and ancestestral state?) |
Aha! The ds = sg.simulate_genotype_call_dataset(n_variant=10, n_sample=4, missing_pct=0, phased=True, seed=1)
print(ds['variant_allele'])
ds['variant_allele'] = ds['variant_allele'].astype(str) # hack around https://github.com/tskit-dev/tsinfer/issues/810
print(ds['variant_allele'])
ds.update({'variant_ancestral_allele': ds['variant_allele'][:,0]})
ds.to_zarr('/tmp/ds.zarr', mode='w')
sd = tsinfer.SgkitSampleData('/tmp/ds.zarr')
ts = tsinfer.infer(sd)
print("Inferred_genos Zarr_genotypes SampleData_genos SD alleles")
for v, sgv, sdv, sda, sdaa in zip(
ts.variants(), ds.call_genotype, sd.sites_genotypes, sd.sites_alleles, sd.sites_ancestral_allele,
):
bad = "<<<< BAD!!!" if not (v.genotypes == sdv).all() else ""
print(v.genotypes, sgv.values.flatten(), sdv, sda, sdaa, bad)
Edit - reported as a bug in sgkit-dev/sgkit#1221 |
I didn't realise this could happen. Perhaps it's worth bugging out if there are duplicated allelic states for a site, as this is pretty confusing. |
Nothing actually checks this I guess, so worth detecting and bugging out here. Clearly a bug in the sgkit simulator, though. |
Phew, had me worried there for a moment! As Jerome says we should error out on this condition. |
I changed the name of the issue to reflect what needs doing. |
I'm finding that tsinfer isn't correctly inferring the genotypes for Sgkit files in the trivial instance below. This seems pretty worrying. What do I have wrong?
Shows that the 7th and 8th site are incorrect:
The text was updated successfully, but these errors were encountered: