Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: multi-stage UUID assignment is unnecessary and inefficient #382

Open
sallain opened this issue Sep 12, 2023 · 3 comments
Open

Problem: multi-stage UUID assignment is unnecessary and inefficient #382

sallain opened this issue Sep 12, 2023 · 3 comments

Comments

@sallain
Copy link
Contributor

sallain commented Sep 12, 2023

Is your feature request related to a problem? Please describe.

The file UUID generation task occurs in three separate places: Assign file UUIDs to objects, Assign file UUIDs to submission documentation, and Assign file UUIDs to metadata. In Archivematica, with its two-stage processing, the file UUIDs are generated during transfer and the metadata and submission documentation UUIDs are generated during ingest. This makes sense for Archivematica because submission documentation and metadata files can be added during backlog and appraisal, between the two stages.

a3m no longer has two-stage processing, so there is no opportunity for users to add or change the submission documentation and metadata files. It seems unnecessary to separate out the jobs in this way, when they all accomplish the same basic task.

This is an extension of #702.

Describe the solution you'd like

Condense the file UUID tasks into one task (could just be called "Assign file UUIDs") that covers everything in the transfer (all digital objects as well as anything contained in a metadata or submission documentation folder - in a bag, this would be everything in the data directory).

This helps to reduce the overall number of jobs in a3m, and contributes to dismantling the two-stage workflow inherited from Archivematica.

Additional context

Note that there is also a UUID generation task for directories, but I'm happy to leave that as a separate task, since it's something that can be configured.

@sallain
Copy link
Contributor Author

sallain commented Sep 12, 2023

I can write more issues, but this same logic could also be applied to:

  • filename change
  • checksum generation
  • characterize and extract metadata
  • file format identification (in this case, we'd have to apply a blanket rule, i.e. all metadata/submission documentation should be ID'd along with the payload)

@Diogenesoftoronto
Copy link
Contributor

There may be an opportunity to pull this out of a3m entirely and have it be something implemented in enduro. I think this might be a good starting point.

@sallain
Copy link
Contributor Author

sallain commented Sep 28, 2023

Oh interesting. Can you provide some info on why this would be desirable?

@sevein sevein changed the title Combine fileUUID generation tasks Problem: multi-stage UUID assignment is unnecessary and inefficient Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants