Problem: multi-stage UUID assignment is unnecessary and inefficient #382

sallain · 2023-09-12T18:09:31Z

Is your feature request related to a problem? Please describe.

The file UUID generation task occurs in three separate places: Assign file UUIDs to objects, Assign file UUIDs to submission documentation, and Assign file UUIDs to metadata. In Archivematica, with its two-stage processing, the file UUIDs are generated during transfer and the metadata and submission documentation UUIDs are generated during ingest. This makes sense for Archivematica because submission documentation and metadata files can be added during backlog and appraisal, between the two stages.

a3m no longer has two-stage processing, so there is no opportunity for users to add or change the submission documentation and metadata files. It seems unnecessary to separate out the jobs in this way, when they all accomplish the same basic task.

This is an extension of #702.

Describe the solution you'd like

Condense the file UUID tasks into one task (could just be called "Assign file UUIDs") that covers everything in the transfer (all digital objects as well as anything contained in a metadata or submission documentation folder - in a bag, this would be everything in the data directory).

This helps to reduce the overall number of jobs in a3m, and contributes to dismantling the two-stage workflow inherited from Archivematica.

Additional context

Note that there is also a UUID generation task for directories, but I'm happy to leave that as a separate task, since it's something that can be configured.

The text was updated successfully, but these errors were encountered:

sallain · 2023-09-12T18:09:40Z

I can write more issues, but this same logic could also be applied to:

filename change
checksum generation
characterize and extract metadata
file format identification (in this case, we'd have to apply a blanket rule, i.e. all metadata/submission documentation should be ID'd along with the payload)

Diogenesoftoronto · 2023-09-28T07:01:37Z

There may be an opportunity to pull this out of a3m entirely and have it be something implemented in enduro. I think this might be a good starting point.

sallain · 2023-09-28T16:03:40Z

Oh interesting. Can you provide some info on why this would be desirable?

sallain mentioned this issue Sep 12, 2023

Combine fileUUID generation tasks artefactual-sdps/enduro#703

Closed

sevein changed the title ~~Combine fileUUID generation tasks~~ Problem: multi-stage UUID assignment is unnecessary and inefficient Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: multi-stage UUID assignment is unnecessary and inefficient #382

Problem: multi-stage UUID assignment is unnecessary and inefficient #382

sallain commented Sep 12, 2023

sallain commented Sep 12, 2023

Diogenesoftoronto commented Sep 28, 2023

sallain commented Sep 28, 2023

Problem: multi-stage UUID assignment is unnecessary and inefficient #382

Problem: multi-stage UUID assignment is unnecessary and inefficient #382

Comments

sallain commented Sep 12, 2023

sallain commented Sep 12, 2023

Diogenesoftoronto commented Sep 28, 2023

sallain commented Sep 28, 2023