Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add start of --change-gene-id admin action #2822

Merged
merged 5 commits into from
Apr 16, 2024

Conversation

kimrutherford
Copy link
Member

Refs #2677

kimrutherford and others added 4 commits April 8, 2024 22:06
@jseager7
Copy link
Collaborator

@kimrutherford I've added code to update the gene name in the allele name, and to update the gene name (rangeDisplayName) and primary identifier (rangeValue) in annotation extensions.

I've tested this with UniProtKB accession numbers in PHI-Canto and it seems to work fine, but you'll have to test it with PomBase / Chado as the gene source because I don't know whether my changes are safe to use with those.

Specifically, I'm assuming there will only ever be one result returned from UniProtKB in the $from_id_lookup_result and $to_id_lookup_result, which allows me to simplify the code for getting the old and new gene names to this:

  my $old_name = $from_id_lookup_result->{found}->[0]->{primary_name};
  my $new_name = $to_id_lookup_result->{found}->[0]->{primary_name};

But I don't know whether that assumption holds for PomBase, or if it even holds for UniProtKB. I guess a safer solution would be to iterate through the results and find the first result where the primary identifier matches the $from_id (then do the same for $to_id), then get the new gene name from that result. I couldn't figure out how to do this at the time though.

@kimrutherford
Copy link
Member Author

Hi James.

Thanks very much for those changes. It all looks good to me. I'm going to merge the PR. We can added any fixes to the main branch.

But I don't know whether that assumption holds for PomBase, or if it even holds for UniProtKB.

It's not going to be a problem for PomBase because our lookup code will only return a single gene for a systematic ID.

I think it's OK for UniProt too since we're looking up accessions.

The web service used for the UniProt lookup is configured in canto.yaml:

webservices:
  uniprot_batch_lookup_url: 'https://rest.uniprot.org/uniprotkb/search?format=xml&query='

@kimrutherford kimrutherford marked this pull request as ready for review April 16, 2024 11:59
@kimrutherford kimrutherford merged commit 391e3ca into master Apr 16, 2024
2 checks passed
@jseager7
Copy link
Collaborator

It all looks good to me. I'm going to merge the PR.

Great, thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants