-
Notifications
You must be signed in to change notification settings - Fork 2
Updating existing items
Any item that has already been accessioned into the SDR can be updated through Preassembly. This process is sometimes referred to as "reaccessioning."
The requirements for updating items via Preassembly are that:
- the Preassembly user making the update must have permission to manage the item in Argo
- the item must be in an "Accessioned" state - if the item is "Opened" or "In accessioning" it must be returned to an Accessioned state before running Preassembly
Every time Preassembly updates an item it creates a new version of that item. This new version can be entirely new (i.e. every file is different from the previous version) or it can be an incremental change to the existing item, such as the addition of a new file or the replacement of an existing file with a new version of that file.
It is also possible to "remove" files from an existing item when making an update via Preassembly, but please keep in mind that the files will only be "removed" from the latest version of the item. They will be retained in the item's previous version history stored in the preservation system.
To replace all files in an item, you can follow the same steps you would follow to accession an item for the first time:
- stage all of the files you are going to accession
- create a manifest.csv listing the druids and folders in the accessioning batch
- optionally, create a file_manifest.csv to apply specific settings to individual files
- run a discovery report and address any issues found in the report
- run Preassembly
An incremental update is an update where you intend to add, modify, or remove files in an item or set of items while leaving the remaining files unchanged. It is possible to do this by:
- staging only the files that are new and/or modified
- creating a manifest.csv listing all the druids and folders in the accessioning batch
- submitting a file_manifest.csv that contains a list of all files to be contained in the new version of the item or set of items
The file_manifest.csv is the key to incremental updates: without it, Preassembly would have no way to determine which files are new and which should be left unchanged.
Unlike with first-version accessioning, when making incremental updates you must create a file_manifest.csv. The simplest way to do this is to download a CSV listing the current set of files from Argo first. To obtain this CSV:
- Navigate to the item's page in Argo
- Scroll down to the "Content" section
- In the upper right-hand side of the "Content" section, click on the link labeled "Download CSV".
- Navigate to the Argo bulk actions page and choose "New Bulk Action"
- Choose "Export structural metadata" (located in the list of CSV-based bulk actions)
- Enter the list of druids that you will be updating
- Submit the bulk action
- Wait for the bulk action to complete and then download the CSV
- Open the CSV in an editor of your choice.
- Edit the CSV as needed
Whether for one item or a batch of items, this downloaded CSV follows the same structure. It contains a list of all files currently in the item or set of items, including all of the specific settings for each file. This is what's known within SDR as "structural metadata": the structure that enables different types of displays: book, image, video, 3D, etc. See Consul for further documentation on SDR structural metadata, including a deeper explanation of the structure behind this CSV.
Once you've obtained the CSV, edit it as needed to reflect the updates you are going to make using Preassembly.
-
If you are replacing existing files and making no other changes,
- Do not modify the CSV at all. You can move ahead to the next step: "staging your files".
- If you are adding any files,
- Insert one line for each file into the CSV
- These new lines must be positioned exactly where you want the new files to appear within the structure of the item
- Depending on the nature of the change, you may need to revise the "sequence" column for all existing lines, for example, if the new file being added is placed within the "middle" of the list rather than appended at the end
- You do not need to fill out every column in the CSV for each new file but you must include
- druid
- resource_label
- resource_type
- sequence
- filename
- publish
- shelve
- preserve
- Other columns are optional and will be filled in by the system according to default
- file_label - this will be filled by the filename
- rights (view, download, location) - these will be the same as the object rights
- mimetype - this will be determined by the system
- role - this will be left blank if not filled in
- If you are removing files so that they will not appear in the new version,
- Remove the lines corresponding to those files
- Note that if the only change you're making is to remove files, you can do this in Argo without using Preassembly
- Finally, save the new CSV as "file_manifest.csv"
Once you've created the new file_manifest.csv you are ready to stage your files for Preassembly. The steps for staging files for updates are no different than the steps for staging files when accessioning an object for the first time, with one exception: you only need to include the files that are being added or modified
In the following example, I've staged
- modified files for druids jq399nx1812 (Page 1) and ww089xf7663 (Page 4)
- a new file for druid nx301cp7407
.
├── file_manifest.csv
├── jq399nx1812
│ └── jq399nx1812_0001.tif
├── manifest.csv
├── nx301cp7407
│ └── newfile.txt
└── ww089xf7663
└── ww089xf7663_0004.tif
In the file_manifest.csv, I inserted newfile.txt
as a new resource (sequence #3) in item nx301cp7407. I also removed two lines from jq399nx1812, representing "Page 2" of that item, which will be removed with my Preassembly update.
druid,resource_label,resource_type,sequence,filename,file_label,publish,shelve,preserve,rights_view,rights_download,rights_location,mimetype,role
jq399nx1812,Page 1,image,1,jq399nx1812_0001.tif,jq399nx1812_0001.tif,no,no,yes,world,world,,image/tiff,
jq399nx1812,Page 1,image,1,jq399nx1812_0001.jp2,jq399nx1812_0001.jp2,yes,yes,no,world,world,,image/jp2,
nx301cp7407,Page 1,image,1,nx301cp7407_0001.tif,nx301cp7407_0001.tif,no,no,yes,world,world,,image/tiff,
nx301cp7407,Page 1,image,1,nx301cp7407_0001.jp2,nx301cp7407_0001.jp2,yes,yes,no,world,world,,image/jp2,
nx301cp7407,Page 2,image,2,nx301cp7407_0002.tif,nx301cp7407_0002.tif,no,no,yes,world,world,,image/tiff,
nx301cp7407,Page 2,image,2,nx301cp7407_0002.jp2,nx301cp7407_0002.jp2,yes,yes,no,world,world,,image/jp2,
nx301cp7407,new text file,object,3,newfile.txt,newfile.txt,yes,yes,yes,world,world,,,
ww089xf7663,Page 1,image,1,ww089xf7663_0001.tif,ww089xf7663_0001.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 1,image,1,ww089xf7663_0001.jp2,ww089xf7663_0001.jp2,yes,yes,no,world,world,,image/jp2,
ww089xf7663,Page 2,image,2,ww089xf7663_0002.tif,ww089xf7663_0002.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 2,image,2,ww089xf7663_0002.jp2,ww089xf7663_0002.jp2,yes,yes,no,world,world,,image/jp2,
ww089xf7663,Page 3,image,3,ww089xf7663_0003.tif,ww089xf7663_0003.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 3,image,3,ww089xf7663_0003.jp2,ww089xf7663_0003.jp2,yes,yes,no,world,world,,image/jp2,
ww089xf7663,Page 4,image,4,ww089xf7663_0004.tif,ww089xf7663_0004.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 4,image,4,ww089xf7663_0004.jp2,ww089xf7663_0004.jp2,yes,yes,no,world,world,,image/jp2,
ww089xf7663,Page 5,image,5,ww089xf7663_0005.tif,ww089xf7663_0005.tif,no,no,yes,world,world,,image/tiff,
ww089xf7663,Page 5,image,5,ww089xf7663_0005.jp2,ww089xf7663_0005.jp2,yes,yes,no,world,world,,image/jp2,
After staging your files and manifests, create a new "project" and run a discovery report. Since you are using a file_manifest.csv:
- in the box labeled "Processing configuration", leave "Default" as the selected option
- check the box labeled "I have a file manifest"
Remember, you must use the "I have a file manifest" option when making incremental updates.
The discovery report will show you the changes that will be made to the object. Review these changes to make sure they are what you expect. Pay particular attention to any files that will be deleted and make sure this list is correct before proceeding to Preassembly.
In this example, the report is showing me the changes I've made, namely:
- updating a file in jq399nx1812
- removing two files from jq399nx1812
- adding a new file to nx301cp7407
- updating a file in ww089xf7663
Note: when submitting a new TIFF file, as in this set of updates, the system will also update the corresponding JP2 to reflect the new TIFF.
If the list looks correct and there are no errors on the discovery report, proceed to Preassembly by clicking the "Run Preassembly" button at the bottom of the discovery report page. This will trigger Preassembly to send the new/modified files to SDR along with the structural metadata that indicates that the remaining files should not be changed.
Incremental updates are new to Preassembly as of October 2023. Before then, Preassembly could make updates to items but only if you staged all files, including files that will not be changed, every time you ran Preassembly. Staging only the new files would result in only the new files being included in the new version of an item - all other files would be removed in that version. There is still a risk that this could happen if you do not use the file_manifest.csv.
If this happens, check the settings on your discovery report job. Remember that you must check the box that says "I have a file manifest". If you did not check that box, please start a new Project with that box checked and do not run Preassembly until you see the correct discovery report output on your new job.
If the system does not see a file manifest, it will not be able to determine which files should remain unchanged from the previous version of the item. It will then fall back to processing the job as if the staged files are the only files to include in the next version of the item.
- Getting started
- Deposit workflow overview
- Content staging
- Using Globus to stage files
- Filling out the Preassembly web form
- Running the Discovery Report and Preassembly Jobs
- Updating existing items
- Accessioning complex content
- Accessioning images with captions
- Explanation of possible errors found by a discovery report
- What Happens After My Job Completes?
- My Job Seems to Be Taking A Really Long Time (like... days!)
- My files did not show up on the PURL as expected
- Using preassembly for self-deposited content