Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance from multiple versions #8

Open
ijiraq opened this issue Oct 23, 2019 · 3 comments
Open

Provenance from multiple versions #8

ijiraq opened this issue Oct 23, 2019 · 3 comments

Comments

@ijiraq
Copy link

ijiraq commented Oct 23, 2019

This sort of relates to Issue opencadc/caom2#66 and the question of cardinality.

Planes are often produced from an ensemble of software, not a single application. In the case of ALMA MS data, in particular, we have MS data that is calibrated using CASA XXX and then split using CASA YYY. The provenance of CASA YYY would tell you that you should use YYY to open these files (MS is not a standard format) but the CASA XXX part is needed to tell you what the calibration system was. In particular CASA XXX is what tells you about calibration trust while CASA YYY part is more about data form. How (if at all) should this be expressed in the provenance?

@pdowler pdowler added the enhancement New feature or request label Nov 28, 2019
@pdowler
Copy link
Member

pdowler commented Nov 28, 2019

This is an issue that goes beyond the use cases and requirements that have driven CAOM development so far. The IVOA Provenance DM does cover this kind of use case, where multiple activities and entities connect an input (entity) to an output (entity).

We can do an analysis of CAOM vs Provenance DM and figure out if there is something useful we can use and whether that would entail a minor or major version.

@pdowler
Copy link
Member

pdowler commented May 24, 2024

There is work in the IVOA to formalise "one-step" or "last-step" provenance and the provenance used here toc onnect a Plane to it's inputs is definitely "last-step" provenance.

The issue here is really that this happened:

idealised : entity1 > activity1 > entity2 > activity2 > entity3

but since entity2 was not stored/kept, it's more like

actual: entity1 > {actrivity1,activity2} > entity3

so there is this composite activity; A composite activity (bunch of s/w bundled and executed together) is what people actually do (vs an idealised provenance sequence).

So, does "last-step provenance" have to capture the details of that composite activity? The reason for only having the last step is that it is simple and I'm not at all certain we can/should model a composite activity there. Pretty sure that's a bad idea.

@pdowler pdowler removed the enhancement New feature or request label May 24, 2024
@pdowler pdowler transferred this issue from opencadc/caom2 Jul 5, 2024
@pdowler
Copy link
Member

pdowler commented Jul 26, 2024

Can easily change the cardinality of Plane.provenance.version from [0..1] to [0..*].

Need to reserve a separator that cannot be used in values (probably | like keywords) for use in relational mapping.
Need to clarify that this now contains {softare name}-{version} strings and that Plane.provenance.name is now more of a logical name.

So for something simple like "used casa-5.2" one could have

name = casa
versions = casa-5.2

(it might have been version = 5.2 in CAOM 2,4)

This would NOT by itself fully specify what each s/w was used for, so in the OP

name = casa
versions = casa-XXX|casa-YYY

would not say which was used for to do step A and which was used for step B. That would require capturing something like {step}:{software} | {step}:{software} and conveying the order/sequence of steps in the composite process... that immediately falls apart if the sequence is non-linear (fork-merge, scatter-gather, map-reduce... it happens all the time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants