Replies: 3 comments 8 replies
-
My short answer is kg-bioportal - we have a non-public build of this graph so we could essentially just transform the nodelist to curies+labels. Bioportal wouldn't cover anything beyond ontologies, though. Other options:
I'm also curious about the extent of the ids we're talking about here - just ontologies? Biomolecule IDs / sequence IDs? Other instance data that's likely to change labels frequently? More general sources, e.g. Wikidata? English labels only? |
Beta Was this translation helpful? Give feedback.
-
I'm not sure what our coverage is overall (let along among the identifiers you're trying), but Translator's Node Normalization tool should be able to do this. You can send it a batch of identifiers, and for each one it will return:
You can try it out using our Swagger interface. If you find a large gap in the identifiers we support, you can report them on our issues page, but we are pretty focused on NCATS Translator-related identifiers for now. |
Beta Was this translation helpful? Give feedback.
-
Are you envisioning doing this at the time the files are generated? Would the labels be stored in the file and then published? My PhD was recently working with historic RDF data files in the form of nanopublications and a big challenge we had was getting labels for the data. |
Beta Was this translation helpful? Give feedback.
-
This is only tangentially related to SSSOM, as part of a pipeline I am building. I would like to make it easy to augment mapping files with the subject_label and object_label fields. So lets say I get a mapping with subject_id, predicate_id, object_id, I want to add in subject_label, object_label for better readability as a postprocessing step (again, not sssom per se).
The question is: What would be the best way to obtain labels for 80-90% of all commonly used identifiers? @cmungall suggested bioportal. Now the problem is, I don't want to hit the bioportal API with 100K queries to augment labels. Any idea how we should do things like that moving forward? Essentially, having a list of 100K identifiers and obtaining all their preferred labels?
What tools should we use? What APIs? Do we need to build a new service or extend an existing one?
cc @caufieldjh @graybeal as you might have ideas.
Beta Was this translation helpful? Give feedback.
All reactions