You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the content-focused companion of #76. We need concept-level, and version-level IDs (and possibly even distribution-level IDs) for files and subdatasets too.
In the following <dsid> would be a DataLad dataset UUID that represents the concept-level identifier of a dataset.
A file can be:
<dsid>/README: "the README of this dataset", i.e. the concept of a file with this name within the scope of that particular dataset
<checksum/annexid>: A particular content blob, which could is isVersionOf a "file concept" within a particular dataset
A distribution-level file description could be
the association of a location/download-method with a content blob -- however, this could also just be a plain property of that blob's metadata record
For subdatasets the situation appears to be slightly more complex. The technical vehicle of a submodule is composed of:
name: akin to schema:name
path: the mountpoint, and in some sense an ID component identifying a dataset as a subdataset within the scope of a superdataset
url: akin to schema:url
commit: akin to schema:version
So we always have a version-level description here. But the described version cannot be a sufficient identifier, because we need to describe the use of that dataset version as a subdataset (the same version can be used in many different superdatasets, with unique path values and possibly other properties).
At the very minimum, we need to reflect this with a unique @id that cannot just be a gitsha (or <dsid>/<gitsha>, see datalad/datalad-registry#217 (comment)).
We could use the same ID format as for files (after all, files in a dataset and subdatasets share the same namespace):
Ping:
This is the content-focused companion of #76. We need concept-level, and version-level IDs (and possibly even distribution-level IDs) for files and subdatasets too.
In the following
<dsid>
would be a DataLad dataset UUID that represents the concept-level identifier of a dataset.A file can be:
<dsid>/README
: "the README of this dataset", i.e. the concept of a file with this name within the scope of that particular dataset<checksum/annexid>
: A particular content blob, which could isisVersionOf
a "file concept" within a particular datasetA distribution-level file description could be
For subdatasets the situation appears to be slightly more complex. The technical vehicle of a submodule is composed of:
name
: akin toschema:name
path
: the mountpoint, and in some sense an ID component identifying a dataset as a subdataset within the scope of a superdataseturl
: akin toschema:url
commit
: akin toschema:version
So we always have a version-level description here. But the described version cannot be a sufficient identifier, because we need to describe the use of that dataset version as a subdataset (the same version can be used in many different superdatasets, with unique
path
values and possibly other properties).At the very minimum, we need to reflect this with a unique
@id
that cannot just be a gitsha (or<dsid>/<gitsha>
, see datalad/datalad-registry#217 (comment)).We could use the same ID format as for files (after all, files in a dataset and subdatasets share the same namespace):
<dsid>/<subdataset-relpath>
and then attach properties, such as:
The text was updated successfully, but these errors were encountered: