Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web VEP: fix incorrect variant end #1059

Draft
wants to merge 2 commits into
base: postreleasefix/112
Choose a base branch
from

Conversation

nuno-agostinho
Copy link
Contributor

@nuno-agostinho nuno-agostinho commented Nov 30, 2023

User-requested bug fix: This PR fixes the incorrect END location for short variants in web VEP by storing this information into the INFO/END field by default, as already done for SVs.

Motivation

Web VEP is returning incorrect END location for multiple variants when using rsIDs. Given that web VEP output is in VCF format, no END location is available by default. To prepare the variant location, web VEP first checks if it's stored in INFO/END field, otherwise it calculates based on the length of the reference allele: https://github.com/Ensembl/public-plugins/blob/main/tools/modules/EnsEMBL/Web/TmpFile/VcfTabix.pm#L254-L264

VEP stores the correct END position in the VariantFeature object. As such, we simply need to save this into the INFO/END field of the VCF output, like in SVs. As this is done for SVs by default, this could also be the default for short variants.

Testing

Compare results for the following variants between:

  • VEP CLI with default output: correct location
  • Web VEP in current website: calculated END location that is incorrect when compared with VEP CLI
  • Web VEP in my sandbox: should have the same (correct) location as VEP CLI
rs746071566
rs3064744
rs72549309
rs267608279
rs72549346
rs72549356
rs72549352
rs72549354
rs72549353
rs5030656

@nuno-agostinho nuno-agostinho marked this pull request as ready for review December 5, 2023 10:16
@nuno-agostinho nuno-agostinho changed the title Web VEP: fix incorrect variant end when using rsIDs Web VEP: fix incorrect variant end Dec 5, 2023
@nakib103 nakib103 self-assigned this Dec 5, 2023
@nakib103 nakib103 self-requested a review December 5, 2023 10:25
Copy link
Contributor

@nakib103 nakib103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nuno-agostinho ,

Thanks for the changes. I can see it has updated the END value of the variant to be same as what we get in the CLI. I tested with two variants, and from CLI I get -
rs72549353: 22:42128251-42128254
rs5030656: 22:42128176-42128178

In the sandbox they gave the same END value - http://wp-np2-11.ebi.ac.uk:7070/Homo_sapiens/Tools/VEP/Results?tl=smKQK4ocAyXPdUac-1

But you will notice that the start value does not match. Is that something you think we should consider? Notice that in the CLI we minimise the allele and gets the position.

@nakib103
Copy link
Contributor

nakib103 commented Dec 6, 2023

The first variant got position -
22:42128250-42128254 - which kind of makes sense if we think as VCF representation

The second variant got -
22:42128174-42128178 - which is not a VCF representation, but can be considered correct if we think the allele is minimised.

@nuno-agostinho
Copy link
Contributor Author

nuno-agostinho commented Dec 8, 2023

Hey @nakib, do you think we should output a START field, similar to END?

START could be created based on $vf->{original_start} (42128174 for rs5030656 and 42128249 for 42128249). This makes me question if we should also use $vf->{original_end} for the END location, what do you think?

@nakib
Copy link

nakib commented Dec 8, 2023

Hey @nakib, do you think we should output a START field, similar to END?

START could be created based on $vf->{original_start} (42128174 for rs5030656 and 42128249 for 42128249). This makes me question if we should also use $vf->{original_end} for the END location, what do you think?

Hey there, @nuno-agostinho! Nakib here, albeit a different one. :)

Copy link
Contributor

@nakib103 nakib103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nuno-agostinho,

Yes it makes sense to use $vf->{original_start}. We have UPLOADED_ALLELE and REF_ALLELE in the web VEP output. Than it would at least be matched with the UPLOADED_ALLELE. Furthermore this is also what is reported by the Variant Table page for location.

Currently it is not matching either of the allele from those two column values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants