Skip to content

Accessioning images with captions (labels)

Andrew Berger edited this page Nov 30, 2023 · 6 revisions

By default, preassembly will assign generic labels to image content. These are the labels that appear in the contentMetadata and displayed in the online viewer as "Image 1", "Image 2", and so on. If you would like to add custom labels to images, you can accession them using a file manifest. The process is as follows:

  1. Stage your content as you would normally do for preassembly
  2. Create the manifest.csv
  3. Create a file_manifest.csv in the same folder as the manifest.csv

The file_manifest.csv should include the following headers (which matches the Argo structural CSV format but with fewer columns)

druid,resource_label,resource_type,sequence,filename,publish,shelve,preserve

The file_manifest.csv is where you will supply the captions for your images, under the column labeled "resource_label". Each file in your accessioning batch should be listed in the file_manifest.csv

The data for each column should consist of the following:

  • druid: the druid of the object (NOTE: this must match the folder the content is in, i.e. object == druid in the regular manifest)
  • filename: the name of the file
  • resource_label: the caption for the image
  • sequence: This should be a number representing the order in which the image should appear within the druid (i.e. the first image listed has sequence '1')
  • publish: ("yes"/"no") whether the file should be listed on the Purl
  • preserve: ("yes"/"no") whether the file should be sent to preservation
  • shelve: ("yes"/"no") whether the file should be stored on Stacks (the server that makes files available to the public)
  • resource_type: the type of resource. For images, it should be "image". If you are combining image files with other file types in the same druid, such as PDFs or text files, choose "file".

Special note on choosing the publish/shelve/preserve settings

The Stanford image viewer uses a format called JPEG2000 ("JP2" for short) to present images in high resolution with dynamic zoom options. Most images accessioned into SDR do not use JP2 as a preservation format, instead using more common formats such as TIFF, JPG, or PNG. When images are accessioned in these formats, a copy is made in JP2 format during accessioning specifically for use with the viewer. This automatically-generated copy is generally not sent to the preservation system as it can be re-generated from the preservation copy.

Images accessioned as JP2 files are retained as JP2 files and do not trigger the automatic image generation process.

What this means for the publish, shelve, and preserve settings:

If your file is not a JP2 image:

  • preserve
    • always choose "yes" (on rare occasions you may not want a file to be preserved, but this would be unusual)
  • shelve
    • select "no" if you only want the auto-generated JP2 file to be downloadable (this is the default for simple image accessioning)
    • select "yes" if you would like someone to be able to download the original file (this would allow downloading the TIFF itself, for example)
  • publish
    • choose the same setting you chose for "shelve": this setting determines if the file is listed in the viewer; "shelve" determines where the file is stored

Behind the scenes, the automatically-generated JP2 images will be given the proper settings to display in the viewer.

If your file is a JP2 image:

  • Choose "yes" for all three settings. This will ensure that your images are both preserved and available in the viewer.

Example

This example uses a single image file, named 'birds-eye.jpg' and a single druid, dm421kp4744.

To accession this image, I created the following folder structure:

example/
├── dm421kp4744
│   └── birds-eye.jpg
├── file_manifest.csv
└── manifest.csv

The top-level folder is named "example". Within this folder are:

  • a subfolder named "dm421kp4744" and containing the image file
  • a file named "manifest.csv"
  • a file named "file_manifest.csv"

Contents of the manifest.csv:

druid,object
dm421kp4744,dm421kp4744

Contents of the file_manifest.csv

druid,resource_label,resource_type,sequence,filename,publish,shelve,preserve
dm421kp4744,"A bird's eye view of San Francisco",image,1,birds-eye.jpg,no,no,yes

For the Preassembly form, I chose the following settings:

  • Content structure: Image
  • Bundle dir: the path to the folder where the manifest files are located
  • Content metadata creation: Default
  • I checked the box labeled "I have a file manifest" - this tells the system to look for the file_manifest.csv file
  • Publish, Shelve, Preserve settings: Default - this tells the system to follow what's in the manifest

Note that in this example, I accessioned a JPG file. A JP2 file was created during the accessioning process. The result was the following content structure, as shown in Argo:

Type: image

    Resource (1): image
    Label: A bird's eye view of San Francisco
        File: birds-eye.jpg (image/jpeg, 3.98 MB, preserve)
        File: birds-eye.jp2 (image/jp2, 2.07 MB, publish/shelve)

In the image viewer, the label shows as the image caption and the JP2 copy of the image is available to the public. The original JPG file is preserved in SDR.