-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide guidelines on use of metadata record identifiers #389
Comments
I think the metadata extractor base class must provide methods to return valid JSON-LD
We do not need
|
I believe I reduced the identifier concept to the minimum complexity. We would need:
Here is a JSON-LD playground link that shows a fully defined record, with no blank node identifiers for a dataset version with two files and one subdataset. The playground also defines an example JSON-LD frame that could be used to retrieve a "plain" list of files from such a record. Here is the record, explanations are given below: {
"@context": {
"dcterms": "https://purl.org/dc/terms/",
"dlds": "https://dx.datalag.org/dataset/",
"dldspart": "https://dx.datalad.org/dataset-part/",
"dlcontent": "https://dx.datalad.org/content/",
"schema": "https://schema.org",
"relpath": "dcterms:identifier",
"isVersionOf": "dcterms:isVersionOf",
"hasPart": "dcterms:hasPart"
},
"@id": "dlcontent:8646787c089052c639f9f477560c6d16b1f4314d",
"@type": "schema:Dataset",
"dcterms:identifier": "mydataset",
"isVersionOf": "dlds:5604ef1f-377f-436a-a0e4-38257c44473c",
"hasPart": [
{
"@id": "dlcontent:MD5E-s0--d41d8cd98f00b204e9800998ecf8427e",
"@type": "schema:DigitalDocument",
"isVersionOf": {
"@id": "dldspart:4f32c4d7-dae7-58c5-8786-b60d2ebc7826",
"relpath": "myfile"
}
},
{
"@id": "dlcontent:MD5E-s190--21f2d4006a8b6bc1a22f2e885d3fbc3a.txt",
"@type": "schema:DigitalDocument",
"isVersionOf": {
"@id": "dldspart:447fbbf7-f732-5fdd-a22d-b3df3bac3e38",
"relpath": "data/pipe.txt"
}
},
{
"@id": "dlcontent:ffdbd35dd78986fd3b6d069ca6669f90399b75da",
"@type": "schema:Dataset",
"isVersionOf": {
"@id": "dldspart:ab66dff3-73bf-5aab-bf30-e8cc3f4d7e90",
"relpath": "sources/myinputs"
}
}
]
} The three identifier types are represented in the context definitions: The basic principles of this document are:
(NB: the property name Post-publication thoughts:
|
http://docs.datalad.org/projects/tabby/en/latest/conventions/tby-ds1.html has a demo for a non-DataLad dataset description that is compatible with this approach, albeit using a slight better semantic setup (see for example the linkage of a POSIX path as a name to a versioned file entity). |
Metadata homogenization is a key challenge. It is made astronomically easier, if one and the same thing being described by two extractors is identified using the exact same identifier. AFAICS metalad provides no guidelines on how to achieve that. Only this issue:
Within the context of https://github.com/psychoinformatics-de/datalad-tabby and https://github.com/datalad/datalad-registry these things are also relevant and are being discussed. Examples:
@id
for aDataset
(Version
) psychoinformatics-de/datalad-tabby#76@id
for dataset content:File
(Version
) andSubdataset
(Version
) psychoinformatics-de/datalad-tabby#78The text was updated successfully, but these errors were encountered: