Skip to content

The Application Interface

Andrew Berger edited this page Oct 25, 2023 · 20 revisions

Using the Preassembly web form

Once your content is staged and ready for deposit, it's time to fill out the Preassembly web form. The web form is how you start the actual deposit process. The information on the form tells the Preassembly application:

  • what kind of content is being deposited
  • how to process the files
  • where to go to find the files (i.e. the staging location)

Accessioning is a two step process:

  1. Fill out the form and start what's called a "discovery report" job
  • This runs a set of checks with the goal of identifying potential problems with your accession
  • The discovery report will flag errors that would cause problems in processing
  • You should address any errors before moving on to run Preassembly
  1. After receiving an error-free discovery report, run a Preassembly job
  • The Preassembly job is the job that actually sends files into the SDR

Small Preassembly jobs, consisting of a few items and less than 1 GB of content, can finish in a matter of minutes. Large Preassembly jobs, consisting of hundreds or thousands of items and multiple terabytes of content will run for days. You do not need to keep the application window open while discovery report and Preassembly jobs are running. Preassembly will email you when your jobs are complete.


The Job Form

preassembly-job-form
  • To prepare a job, supply the following information
    • Project name [required]
      • This must be unique for a specific user, but does not need to be universally unique. You will get an error if you have already submitted a job with an identical name
      • Project names cannot include spaces. The allowable characters are: A-Z, a-z, 0-9, hyphen and underscore.
      • Hint: It's often easiest to use the accessioning ticket number for the Project Name - and makes it easier for the accessioning manager to track your job
    • Job type [required]
      • You must always start with the Discovery Report in order to make sure that your job is valid and able to be accessioned
      • Do not choose Pre Assembly Run without first running a discovery report * In most cases, you will not need to fill out the form again to run a Preassembly job. Instead, you will run it directly from the page where you view your discovery report.
    • Content structure [required]
      • Image is for any non-book-like image materials
      • Book covers book-like materials (choose ltr or rtl for left-to-right or right-to-left orientation books)
      • Document is for PDF files
      • File is for items that will be displayed as a file listing
      • Media is for material that should be presented in the embedded streaming viewer (audio or video). Note that if you use the media content type, you must supply a file manifest. (See instructions)
        • 3d is for 3d objects
        • Map is for map objects and is essentially the same type as "image"
        • 'Webarchive Seed` is for web archive seed objects
      • If your content doesn't easily fit into one of these categories, please consult with the Repository Manager before accessioning the content.
    • Staging location [required]
      • The staging directory is the location where you put your files and manifest * This can be either a Globus link or a directory path to a shared mount * If it is a directory path * This will always take the form /{storageMount}/{path to location of files and manifest}
        • Your {storageMount} value must be in the pre-approved list (see Consul)
          • If your location is a Globus link - Paste the link directly into the box - If you are not sure how to find your link, follow these instructions
    • Processing configuration [required]
      • Default puts each file in a separate resource in the digital object
      • Group by filename bundles files with the same filename but different extensions into a single resource (for instance foo.tif and foo.pdf), so this is the preferred method for image and book items where multiple files exist for each page or image
      • Group by filename (with pre-existing OCR) is intended for content where OCR files have been generated in ALTO format and will be included alongside images in the same accessioning batch
      • I have a file manifest - check this box if you are supplying a file manifest for media or other complex items, updates to existing items, or in any other situation where you choose to use a file manifest for an accessioning project.
    • Preserve, Shelve, Publish Settings [required]
      • Default uses the publish, shelve, and preserve settings that are appropriate to most pre-assembly projects. These settings generally distinguish "access" files from "preservation master" files based on file type and only make the access files available on the Purl. Note that if using the media content type, you should use this setting: it will not affect the settings in your media_manifest.csv file.
      • Preserve=Yes, Shelve=Yes, Publish=Yes will make all files in your pre-assembly job available from the Purl as well as send them to preservation storage. Only use this setting if you know that the default settings will not be appropriate for your pre-assembly job, and that all files in the job should be made available online. If in doubt about whether to use this setting, please contact the Repository Manager.

Once you have filled out the form and clicked the Submit button to start a Discovery report job, the job will begin to run created and will appear on the list in the right-hand Recent jobs column. Preassembly will send you an email when your job is complete. You can also return to the Preassembly application and check the status of your job by clicking on the link to the job. It is not necessary to keep the browser window open while the job runs.

The duration of a discovery report job depends largely on the total number of files in the job: a job with fewer than one thousand files should finish within minutes. Jobs with many thousands of files will take longer to complete.

Typical settings for common types of content

See Consul for more detailed descriptions of SDR content types.

Book

  • Content structure: Book (ltr) or Book (rtl) depending on if content is left-to-right or right-to-left
  • Processing configuration:
    • Use "default" if you have one file per page
    • Use "Group by filename" if you have multiple files per page (such as a TIFF and a PDF of the same page)
    • Use "Group by filename with pre-existing OCR" if you have both images and OCR (in Alto format) for each page
  • I have a file manifest: leave unchecked unless you created a file_manifest.csv
  • Publish, preserve, shelve: Default

Document

Document supports PDF files only.

  • Content structure: Document
  • Processing configuration: Default
  • I have a file manifest: leave unchecked unless you created a file_manifest.csv
  • Publish, preserve, shelve: Default

File

File accepts all files, which will be shown as a file list.

  • Content structure: File
  • Processing configuration: Default
  • I have a file manifest: leave unchecked unless you created a file_manifest.csv
  • Publish, preserve, shelve: Default

Image

  • Content structure: Image
  • Processing configuration:
    • Use "default" if you have one file per image
    • Use "Group by filename" if you have multiple files per image (such as a TIFF and a PDF of the same page)
    • Use "Group by filename with pre-existing OCR" if you have both images and OCR (in Alto format) for each image
  • I have a file manifest: leave unchecked unless you created a file_manifest.csv
  • Publish, preserve, shelve: Default

Map

  • Content structure: Map
  • Processing configuration:
    • Use "default" if you have one file per image
    • Use "Group by filename" if you have multiple files per image (such as a TIFF and a PDF of the same image)
  • I have a file manifest: leave unchecked unless you created a file_manifest.csv
  • Publish, preserve, shelve: Default

Media

Media is for streaming media, specifically audio and video.

  • Content structure: Media
  • Processing configuration: Default
  • I have a file manifest: check the box:
    • You must have a file_manifest.csv to accession media (Instructions)
  • Publish, preserve, shelve: Default