-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VariantData sequence_length not being taken from zarr contig_length
field
#949
Comments
contig_length
field on inferencecontig_length
field
To use |
Don't we need to error out if the variants cover multiple contigs? I guess only if the sites mask leaves more than one contig available. Presumably we check this when running tsinfer on the VariantData object? That implies (to me) that the sequence length is essentially dependent on the mask used, right? Following on from that thought, the sequence length can be picked from the contig_length field (and cached) after checking the uniqueness of the (masked) |
If |
If there is more than one contig present in the VCF Zarr we require a I don't think we should be requiring any custom Zarr attrs, as we won't have write access in many cases. |
Sorry, I'm not getting this. At the moment users can quite happily make a .vcz file with multiple defined contigs, and feed that to tsinfer. It then only barfs because positions aren't monotonically increasing. I just wrote some extra docs to show how to mask out all but one contig. Are you saying that we should detect multiple contigs when making a VariantData object, and require one of them to be specified? If so, then I guess it would be also reasonable to get the equivalent
Yes, that seems the right approach. |
Yeah, it seems that if there are multiple contigs in the VCF then it would be helpful to tell tsinfer which one it should use. You're right that this cuts across the site masking though, so it's not entirely clear what the right thing to do here is. Perhaps just providing a |
I think the best thing (as long as it doesn't kill efficiency) is to allow a We can then check if the (masked) |
Introduces a `contig_id` parameter to variant_data, as described in tskit-dev#949. Fixes tskit-dev#249
Introduces a `contig_id` parameter to variant_data, as described in tskit-dev#949. Fixes tskit-dev#249
Introduces a `contig_id` parameter to variant_data, as described in tskit-dev#949. Fixes tskit-dev#249
Introduces a `contig_id` parameter to variant_data, as described in tskit-dev#949. Fixes tskit-dev#249
Introduces a `contig_id` parameter to variant_data, as described in tskit-dev#949. Fixes tskit-dev#249
Introduces a `contig_id` parameter to variant_data, as described in tskit-dev#949. Fixes tskit-dev#249
Introduces a `contig_id` parameter to variant_data, as described in tskit-dev#949. Fixes tskit-dev#249
I'm having problems getting
VariantData.sequence_length
to match the value in the contig_length field of the zarr file. It wasn't clear until reading the code that I needed to setz.attrs["sequence_length"]
. If there isn't az.attrs["sequence_length"]
defined, should we take the value fromcontig_length
instead?I was having the problem when trying to round-trip data from a tree sequence via a vcf file into another inferred tree sequence, and the sequence lengths didn't match up.
The text was updated successfully, but these errors were encountered: