-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define DataLad metadata model #56
Comments
We can take a look at W3C's Provenance ontology (https://www.w3.org/TR/prov-o/) for some ideas w.r.t. versioned dataset repositories, i.e. datalad-datasets, git-repos. |
Been searching for something that we could use and found a few interesting links:
They use Wikidata as a source for some git-related entities but I didn't find anything useful for submodule yet. They also use: https://www.semanticarts.com/gist/ |
Started the work on this in the metalad repo (https://github.com/datalad/datalad-metalad/tree/ld-metadata-model). A few remarks (taken from the first commit message in the branch): The branch currently only adds a turtle-document that should grow into a schema that describes the internal metadata structure of metalad. The current version is far from complete and rather naive, it does not yet take a large number of schemas/ontologies into account. I did look at the HCLS-Dataset Descriptions and it seems to me that their The git-layer is not modelled. That could be done by `https://github.com/justin2004/git_to_rdf (which @jsheunis pointed to). The intention of the layer described here is to model the metalad specific metadata-elements. The git-related RDF content can be attached to a |
The rdf schema definition includes an example instance, i.e. The following query, for example, yields all metadata formats, i.e. extractor_names, that are present in the graph:
The output would be something like:
|
From my POV the essence of this issue is described and demo'ed in datalad/datalad-metalad#389 (comment) Given this key nature of this topic that is not tabby-specific at all, I will close this issue here. |
This is closely linked to #37 and #55.
We are in dire need to define a comprehensive (meta)data model that can capture the various basic entities we need to capture. Some of these are already covered in http://htmlpreview.github.io/?https://github.com/joejimbo/HCLSDatasetDescriptions/blob/master/Overview.html, e.g.
Dataset
,DatasetVersion
. Things likeDataDistribution
do not seem to have a clear applicability in the DataLad context.What is critically needed is a model to describe a subdataset or Git submodule: A repository/branch state that is employed/imported/mounted in the context of a particular superdataset, at a particular location.
It seems necessary that this entity is/must be distinct from
Dataset
orDatasetVersion
, because either one of those could not contain the necessary information, as they can be used in multiple context in bitidentical form.Maybe there is an ontology of Git concepts already?
The text was updated successfully, but these errors were encountered: