Replies: 49 comments
-
For clarity: we "capture" the link in OtherIdentifiers (same as relationships and collector numbers and such), we share via associatedSequences.
I think that's just a case of failing to catalog the item of scientific interest. The virus should have been cataloged and related to the mammal. That of course doesn't always happen, and my "GenBank numbers are 'self.'" statement in #2121 seems to be wrong in this case.
All identifiers carry a value from https://arctos.database.museum/info/ctDocumentation.cfm?table=ctid_references; perhaps we need a way to express this situation, which probably isn't as rare as it really should be. |
Beta Was this translation helpful? Give feedback.
-
Yes, it is unfortunate that the virus community is not better at providing
cataloged "voucher specimens" that we can link to. It has been very
difficult to get most virologists to identify, designate, or archive a host
voucher, or even when these exist, to link to them on GenBank. Much of the
MSB's effort at tracking viruses extracted from mammal specimens have
occurred over the past decade or more, prior to our having a parasite
collection, so there are GenBank links for viruses as well as parasites
that are attached directly to the mammal host with "self" relationships.
Now that we have the capacity to catalog the parasites separately, that
should be done and those GenBank sequences moved over to the parasite
record, but that is a process that would consume quite a bit of staff time
and resources. I'd be happy to try if we can identify those samples, but
unfortunately this may require going record by record based on which
mammals have virus-associated publications or citations. We can look for
"symbiotype" in the citation, but that was not always available for legacy
records. We also need a way to designate relationships in citations to
alternate taxa, e.g." symbiotype of ... Taxon A(virus name)".
…On Tue, Apr 6, 2021 at 10:59 AM dustymc ***@***.***> wrote:
* [EXTERNAL]*
Arctos captures the links to genbank in associatedSequences.
For clarity: we "capture" the link in OtherIdentifiers (same as
relationships and collector numbers and such), we share via
associatedSequences.
In this case, a virus (hantavirus) was extracted from the host specimen.
I think that's just a case of failing to catalog the item of scientific
interest. The virus should have been cataloged and related to the mammal.
That of course doesn't always happen, and my "GenBank numbers are 'self.'"
statement in #2121 <#2121> seems
to be wrong in this case.
do you keep track of the kind of association between the host specimen and
the sequence
All identifiers carry a value from
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctid_references;
perhaps we need a way to express this situation, which probably isn't as
rare as it really should be.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3550 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBBEX7YFMGEO7QPUE6LTHM4XTANCNFSM42PC4JWA>
.
|
Beta Was this translation helpful? Give feedback.
-
@dustymc @campmlc thanks for your prompt reply and for sharing background. Great to hear that genbank numbers can have association types just like specimens do. I can imagine that going back and identifying the association types for existing genbank numbers with their specimen can be quite laborious. However, through GloBI, I can perhaps provide an exhaustive list of genbank numbers associated with viruses. That said, I realize that it'll take time and effort to cross reference and double check . . . so perhaps something do to when the time is right? It might be worth mentioning that many researchers are unaware of these rich linkages that you keep. . . I am doing my best to communicate the good work on associations. . . I guess it'll take time for it to take hold. |
Beta Was this translation helpful? Give feedback.
-
Hey Jorrit,
Vast majority of our host/virus relationships/linkages are for those which we have the symbiotype specimen here at MSB. These were done manually based on our knowledge of the relationships and an effort to get virologists doing descriptions to include host info going forward. The paper attached has the recommendations for this. Best, Jon
…______________________________________________________________
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
________________________________
From: Jorrit Poelen ***@***.***>
Sent: Tuesday, April 6, 2021 11:16 AM
To: ArctosDB/arctos ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [ArctosDB/arctos] [CONTACT] association type of the associated sequences related to host vouchers (e.g., https://arctos.database.museum/guid/MSB:Mamm:210229 https://www.ncbi.nlm.nih.gov/nuccore/EU241637) (#3550)
[EXTERNAL]
@dustymc<https://github.com/dustymc> @campmlc<https://github.com/campmlc> thanks for your prompt reply and for sharing background.
Great to hear that genbank numbers can have association types just like specimens do.
I can imagine that going back and identifying the association types for existing genbank numbers with their specimen can be quite laborious. However, through GloBI, I can perhaps provide an exhaustive list of genbank numbers associated with viruses. That said, I realize that it'll take time and effort to cross reference and double check . . . so perhaps something do to when the time is right?
It might be worth mentioning that many researchers are unaware of these rich linkages that you keep. . . I am doing my best to communicate the good work on associations. . . I guess it'll take time for it to take hold.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#3550 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AED2PAYH6B2QIMPJKRQOBS3THM6V5ANCNFSM42PC4JWA>.
|
Beta Was this translation helpful? Give feedback.
-
@jldunnum thanks for sharing. Can you please share a citation string for the paper? Github issues does not keep the attachment alive. Also, just in case y'all are feeling ambitious, I've attached a (partial) list of virus genbank numbers extract from indexed Grange et al. 2021 using The first 10 are:
|
Beta Was this translation helpful? Give feedback.
-
@jhpoelen this would be most helpful" an exhaustive list of genbank numbers associated with viruses" |
Beta Was this translation helpful? Give feedback.
-
@campmlc I shared a partial list, other GloBI indexed datasets can be used to complement this list if needed. |
Beta Was this translation helpful? Give feedback.
-
Potentially a fun project for an intern/CS student/etc.
It's appreciated! We obviously aren't great at communicating what we do. We've been talking to and working with GenBank since ~2000; I'm (obviously!) not sure how to do better, but I think it'll involve more than just time. @jldunnum your attachment didn't come through. Related: |
Beta Was this translation helpful? Give feedback.
-
Dunnum, Jonathan L., Richard Yanagihara, Karl M. Johnson, Blas Armien, Nyamsuren Batsaikhan, Laura Morgan, and Joseph A. Cook. "Biospecimen repositories and integrated databases as critical infrastructure for pathogen discovery and pathobiology research." PLoS Neglected Tropical Diseases 11, no. 1 (2017): e0005133.
…______________________________________________________________
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
________________________________
From: Jorrit Poelen ***@***.***>
Sent: Tuesday, April 6, 2021 11:26 AM
To: ArctosDB/arctos ***@***.***>
Cc: Jonathan Dunnum ***@***.***>; Mention ***@***.***>
Subject: Re: [ArctosDB/arctos] [CONTACT] association type of the associated sequences related to host vouchers (e.g., https://arctos.database.museum/guid/MSB:Mamm:210229 https://www.ncbi.nlm.nih.gov/nuccore/EU241637) (#3550)
[EXTERNAL]
@jldunnum<https://github.com/jldunnum> thanks for sharing. Can you please share a citation string for the paper? Github issues does not keep the attachment alive.
Also, just in case y'all are feeling ambitious, I've attached a (partial) list of virus genbank numbers extract from indexed Grange et al. 2021 using elton interactions globalbioticinteractions/grange2021 | grep -P -o "https://[^\t]+nuccore[^\t]+" | sort | uniq > virus_genbank_numbers.txt
The first 10 are:
$ cat virus_genbank_numbers.txt | head
https://www.ncbi.nlm.nih.gov/nuccore/AB010730
https://www.ncbi.nlm.nih.gov/nuccore/AB010731
https://www.ncbi.nlm.nih.gov/nuccore/AB010732
https://www.ncbi.nlm.nih.gov/nuccore/AB010733
https://www.ncbi.nlm.nih.gov/nuccore/AB010734
https://www.ncbi.nlm.nih.gov/nuccore/AB010735
https://www.ncbi.nlm.nih.gov/nuccore/AB010736
https://www.ncbi.nlm.nih.gov/nuccore/AB010737
https://www.ncbi.nlm.nih.gov/nuccore/AB010738
https://www.ncbi.nlm.nih.gov/nuccore/AB010739
virus_genbank_numbers.txt<https://github.com/ArctosDB/arctos/files/6266489/virus_genbank_numbers.txt>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3550 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AED2PA27VL2GQAKUJL4DWZDTHM727ANCNFSM42PC4JWA>.
|
Beta Was this translation helpful? Give feedback.
-
Great! I don't suppose it would be possible to identify GenBank accessions that have a non-mammalian organism or taxon name but an MSB:Mamm specimen voucher or LinkOut? |
Beta Was this translation helpful? Give feedback.
-
Another issue is that many pathogen/parasite papers that actually did cite a host used our field/tissue number "NK" and not our MSB catalog number.
…______________________________________________________________
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
________________________________
From: Mariel Campbell ***@***.***>
Sent: Tuesday, April 6, 2021 11:30 AM
To: ArctosDB/arctos ***@***.***>
Cc: Jonathan Dunnum ***@***.***>; Mention ***@***.***>
Subject: Re: [ArctosDB/arctos] [CONTACT] association type of the associated sequences related to host vouchers (e.g., https://arctos.database.museum/guid/MSB:Mamm:210229 https://www.ncbi.nlm.nih.gov/nuccore/EU241637) (#3550)
[EXTERNAL]
Great! I don't suppose it would be possible to identify GenBank accessions that have a non-mammalian organism or taxon name but an MSB:Mamm specimen voucher or LinkOut?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3550 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AED2PA2LYFUV3QNMAP24CTTTHNAM5ANCNFSM42PC4JWA>.
|
Beta Was this translation helpful? Give feedback.
-
Given a little time, this can surely be done especially because of the excellent informatics resources that GenBank and Arctos provide. Also, already indexed datasets by GloBI already provide a starting point. https://github.com/globalbioticinteractions/virus-host-db comes to mind. |
Beta Was this translation helpful? Give feedback.
-
https://en.wikipedia.org/wiki/Money_services_business - right?!? We've been avoiding really embracing https://handbook.arctosdb.org/how_to/cite-specimens.html forever. "MSB 210229" (and the infinite variations thereof) could mean just about anything, and digging it out of a publication is never going to be foolproof. "https://arctos.database.museum/guid/MSB:Mamm:210229" and "http://dx.doi.org/10.7299/X7ZK5H0X" are completely unambiguous. Demanding those kinds of identifiers from users would eliminate any confusion going forward, and sort of accidentally save you a whole bunch of work (which might be redirected to dealing with the legacy stuff) in the process.
Yep! Arctos has an API, GenBank has an API, doing more in that intersection is just a matter of time. (I'm not sure sure about "little" though...) |
Beta Was this translation helpful? Give feedback.
-
We have attempted "Demanding those kinds of identifiers" from GenBank as a
required field/controlled vocab, most recently at the ASM meeting the
summer before covid, but there still seems to be some reluctance or lack of
awareness of the problem, at least from representatives designated to
attend that meeting. There is also extreme reluctance to allow the
collections that actually hold the specimens to make edits to fields that
were incorrectly filled out by researchers submitted sequences.
…On Tue, Apr 6, 2021 at 11:56 AM dustymc ***@***.***> wrote:
* [EXTERNAL]*
MSB catalog number
https://en.wikipedia.org/wiki/Money_services_business - right?!?
We've been avoiding really embracing
https://handbook.arctosdb.org/how_to/cite-specimens.html forever. "MSB
210229" (and the infinite variations thereof) could mean just about
anything, and digging it out of a publication is never going to be
foolproof. "https://arctos.database.museum/guid/MSB:Mamm:210229" and "
http://dx.doi.org/10.7299/X7ZK5H0X" are completely unambiguous. Demanding
those kinds of identifiers from users would eliminate any confusion going
forward, and sort of accidentally save you a whole bunch of work (which
might be redirected to dealing with the legacy stuff) in the process.
Given a little time
Yep! Arctos has an API, GenBank has an API, doing more in that
intersection is just a matter of time. (I'm not sure sure about "little"
though...)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3550 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBCX2TJCEEQ3TRC3AIDTHNDODANCNFSM42PC4JWA>
.
|
Beta Was this translation helpful? Give feedback.
-
I can understand their reluctance to that; it's not really their job, and most of the data they see won't ever have that level of information. I obviously don't KNOW anything, but I think this would be between loan-ers and loan-ees (and/or perhaps part of your internal licensing). Arctos contains a genbank publisher tool (IDK if it's functional, it doesn't get any use so it doesn't get any attention) which completely eliminates any ambiguity there. It even deals with barcodes, so if you have those you can tie sequences to specific parts and not just catalog records. GenBank is special in regard to identifiers; they are one of two systems (Arctos is the other) in which "MSB:Mamm:210229" is NOT ambiguous, because we worked out the specimen_voucher field and registry with them. 65 of the current 215 collections in Arctos claim to have registered with GenBank - we as a community could certainly do better. I believe that everything we can currently do with GenBank was worked out with Scott Federhen, and not much has changed since he died. He at least was willing to allow edits by "owning institutions" if the submitter could not be convinced to make updates, I don't know if anyone else might be inclined to allow that or even who you'd ask. (I wonder if an agreement regarding future edits to GenBank might also be part of loan agreements?) Might be worth knocking on the door if you're ever in DC - we could certainly use another interested insider. |
Beta Was this translation helpful? Give feedback.
-
@jhpoelen @jldunnum - following up on this. Regarding below,
From the Arctos end, we can find a lot of these via a search on "symbiotype" - but this does not distinguish symbiotype of "what taxon". In the meantime, I'm going through the symbiotype records with links to GenBank and adding the "host of" references, which should give @jhpoelen something to start with. |
Beta Was this translation helpful? Give feedback.
-
Also @dustymc note that the reciprocal linkouts for the GenBank virus sequences are still not working in this example. |
Beta Was this translation helpful? Give feedback.
-
Here is another with relationships added. @jhpoelen can you use these examples to find others? |
Beta Was this translation helpful? Give feedback.
-
I just created this relationship: https://arctos.database.museum/guid/MSB:Mamm:135531 with a taxon name (new species) as an OrganismID. Should actually change to a "host of" relationship to proposed new field "TaxonID" which would link to the taxonomy table. |
Beta Was this translation helpful? Give feedback.
-
Catalog the stuff that seems to be important and make the correct assertions. |
Beta Was this translation helpful? Give feedback.
-
@campmlc you wrote:
Are you intrigued by the idea of a panel discussion / webinar about the above topic with members of the virus community joining us? We could discuss changes (that occurred as a result of Covid) and changes in standard-of-practice still needed or needing to be adopted -- both by collections and virologists? Pam Soltis and I could possibly arrange such a thing. |
Beta Was this translation helpful? Give feedback.
-
@dustymc this would require we create new virus collections, fungal collections, bacterial collections etc for things we do not have vouchers for, in order to say that this "host" record is related to this "pathogen" record". And which institution will manage these? Right now we can do this for our integrated host and parasite collections at the institutional level, if we add in all the taxonomy (big can of worms, there), but what about things in external repositories? |
Beta Was this translation helpful? Give feedback.
-
If those things exist then of course they can be linked to. If they don't but structured data are necessary, a Host collection could be used. That's of course more work for all the reasons you point out, but I don't think there's a lesser cost which leads to those kinds of data. If structured data aren't critical (or critical enough to inspire someone to manage a Host collection, anyway!), then things like verbatim host ID provide a text-based alternative. I don't think any amount of shoehorning will much change that, but it might break other things. |
Beta Was this translation helpful? Give feedback.
-
Hey y'all - coming to the conversation a bit late, but please note that GloBI is now resolving the ncbi records as reported in the arctos records. This means that GloBI also pulls in the taxonomic information (and more) from the NCBI genbank records and enables taxonomic searches for either host or hostee . E.g., https://arctos.database.museum/guid/MSB:Mamm:148794 has already been indexed by GloBI (see attached screenshots). For this specific example, you can find specimen to specimen links via "download csv sample" link or
|
Beta Was this translation helpful? Give feedback.
-
To find all arctos - genbank links known to GloBI, you could use something like;
which is bash linux speak for saying: get me the latest indexed interactions via GloBI's interactions.tsv. Then select only rows that contain "arctos.database" and "nuccore" terms. Finally, put the results in the file |
Beta Was this translation helpful? Give feedback.
-
According to recent interactions.tsv, this yield 444 interaction claims. See attached zip for csv/tsv versions of these claims. Curious to hear whether this is at all useful. |
Beta Was this translation helpful? Give feedback.
-
Fantastic! Thanks @jhpoelen ! I'll look over this list and see what else we can add. |
Beta Was this translation helpful? Give feedback.
-
@jhpoelen may i say how much i love the above "translation" |
Beta Was this translation helpful? Give feedback.
-
@debpaul you are welcome! Please do let me know if other things need translating. |
Beta Was this translation helpful? Give feedback.
-
Hi!
As I was looking into indexing a recently published host-virus dataset via https://www.pnas.org/content/118/15/e2002324118 and globalbioticinteractions/globalbioticinteractions#644 , I stumbled across https://www.ncbi.nlm.nih.gov/nuccore/EU241637 and their link to https://arctos.database.museum/guid/MSB:Mamm:210229 (see attached screenshot).
Very neat to see how all the links are pointing back and forth across the various systems (e.g., genbank <-> Arctos).
Currently, Arctos captures the links to genbank in associatedSequences. However, from the data provided, it is not clear what was sequenced. In this case, a virus (hantavirus) was extracted from the host specimen.
When dealing with associated sequences, do you keep track of the kind of association between the host specimen and the sequence, like you do with the host-parasite relations?
Ideally, I'd like to extract species interactions records from the associatedSequences, but only if the sequence documents anything other than the host itself.
Thanks for all your hard work in keeping Arctos going!
related to #2121 .
Beta Was this translation helpful? Give feedback.
All reactions