Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observation.observationID and Plane.productID are too restrictive #3

Open
pdowler opened this issue Jun 18, 2024 · 6 comments
Open
Labels
2.5 enhancement New feature or request

Comments

@pdowler
Copy link
Member

pdowler commented Jun 18, 2024

In the code (java and python) these strings are restricted to being "valid path components".

These fields are used to generate several URIs:

ObservationURI of the form caom:{collection}/{observationID} for use as a reference in DerivedObservation.members

PlaneURI of the form caom:{collection}/{observationID}/{productID} for use as a reference in Plane.provenance.inputs

It is likely that Plane.creatorID is also assigned by using these values.

Although not part of the model, the CADC implementation (at least) uses these fields to generate Plane.publisherID that is the primary external reference to a Plane (product) and used as the input ID by the caom DataLink service.

@pdowler
Copy link
Member Author

pdowler commented Jun 18, 2024

The restriction may be too limiting for users who have existing identifiers with more structure (eg. multiple path components) that they need to capture. This could in principle be an issue for collection names (eg Survey/DRi vs Survey/DRj).

see #12

The purpose of the restriction was to be able to enable code to convert the fields to a URI and parse it back to the individual field values. That is currently possible because ObservationURI has exactly 2 components and PlaneURI has exactly 3 components. It would not be possible to parse and extract if components but the productID can have multiple path components.

Current code only uses the restrictions to implement validation of members and inputs and to assign consistent values in the observation and plane tables to support joins via these relationships. It would feasible to lift the restriction and make "valid URI content" be a metadata curation issue.

@pdowler
Copy link
Member Author

pdowler commented Jun 18, 2024

Currently Observation.collection, Observation.observationID, and Plane.productID are the fields in the model. It would be more explicit to drop those and have the model include the URIs directly:

Observation.uri 
Plane.uri

I would like to retain the basic form (caom:{collection}/...) so the scheme means "CAOM entity" and a prefix on the Observation.uri is a namespace that can be used to reference collections (usage: permissions, metadata-sync). Being able to reliably extract the collection name from these URIs also means that data providers can consistently generate their own publisherID (see below).

Plane.creatorID already exists and should be an ivorn (ivo://{auth}/{collection}?...)

Data providers still need to inject their own Plane.publisherID (same as creatorID for original publisher) and that needs to be clarified/documented (but probably in a CAOM+TAP standard). In the current style of usage, a data provider would register a "data collection" in the IVOA registry with a resource identifier like ivo://{authority}/{collection} eg ivo://cadc.nrc.ca/CFHT. From there, a publisherID (of some "data" from that collection can be created by appending a query string with the logical Plane identifier. Current practice at CADC is to construct the publisherID as ivo://{authority}/{collection}?{Observation.observationID}/{Plane.productID}. As long as the collection name can be unambiguously extracted from the ObservationURI (and PlaneURI) then this is easy to do. More structure (path components in the observationID and/or productID) would be OK - those would become opaque strings that could be read (by a human) but not parsed by any generic code.

@pdowler pdowler added the enhancement New feature or request label Jun 18, 2024
@pdowler
Copy link
Member Author

pdowler commented Jun 19, 2024

For validation purposes, it would be good to require that Observation.uri plus a separator (/) character is a prefix of Plane.uri. Anything else is probably a mistake (caused by a bug).

@pdowler pdowler transferred this issue from opencadc/caom2 Jul 5, 2024
@pdowler pdowler added the 2.5 label Jul 26, 2024
@pdowler
Copy link
Member Author

pdowler commented Jul 26, 2024

current draft solution:

keep Observation.collection
replace Observation.observationID with Observation.uri, existing usage of caom:{collection}/{observationID} is OK
replace Plane.productID with Plane.uri, existing usage like caom:{collection}/{observationID}/{productID} is OK
remove Plane.creatorID (redundant)

DerivedObservation.members references use Observation.uri values
Plane.provenance.inputs references use Plane.uri values

Encourage/require "local" reference style using the caom URI scheme as above?

The publisherID (of a Plane) is not part of the model: it is a concept for the relational mapping and implementors need to generate their own values. It is probably of the form ivo://{authority}/{collection}?{observationID}/{productID}. Code to generate publisherID values from Plane.uri and Observation.collection is feasible if the latter is limited to "single-path-component" like it is in CAOM-2.4; restrictions on number of path components in the rest of the URI could be removed

@pdowler
Copy link
Member Author

pdowler commented Jul 26, 2024

This issue addresses the topic in #12

@pdowler
Copy link
Member Author

pdowler commented Jul 26, 2024

With respect to one point mentioned in #12: spaces in identifiers that are likely to be used in APIs are quite dangerous, which is the primary reason for the restrictions in CAOM-2.4 and earlier. Even with the above changes, using spaces in the Observation.uri or the Plane.uri are problematic at best because space needs to be encoded/decoded carefully just to store the value in a URI.

used in APIs: in http params you should encode/decode anyway so it is fine, but if the URI or a prtion of it goes into the path then of a REST API call then it's a mess.

There is already considerable use of + in the names of things in astronomy and that includes these URIs; + does not need to be encoded to use it in a URI. And someone thought it was a good idea to use + as an alternate encoding for space in URL-encoding so adding any encoding/decoding requirement to the model/data structure looks like a nightmare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.5 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant