Content Staging

Staging Content

"Staging" content is the process of organizing files for deposit. Depending on the volume and complexity of your data, and the specifics of your workflow, the staging process could consist of:

Copying folders and files to a shared mount using a file manager (such as the Mac's Finder or Windows' Explorer)
Uploading folders and files to the Globus file transfer service
Copying folders and files to a shared mount using SFTP or command-line tools

There is one prerequisite to staging your content: you must have digital repository identifiers for each item your are depositing or updating. If you do not yet have these identifiers, you must create them through a process called registration.

1. Organizing your files

No matter how you transfer files into the SDR, when staging your files you should always follow the same organizational structure:

Create a single folder to contain all items in your deposit
Within that folder, create an individual folder for each item
- Each folder should contain the file or files to be deposited for that item
Create a manifest file that lists all of the items and folders to be deposited
- This manifest file must be a CSV and must be named manifest.csv
If necessary, create a second manifest file called a "file manifest" that lists all files in the batch of items being deposited
- This file must be a CSV and must be named file_manifest.csv
- While not required for every Preassembly deposit, the file manifest is required when:
  - Depositing media items
  - Applying custom settings to specific files in a deposit
  - Updating existing items

Instructions for creating the manifest file(s) are given below.

Example of a deposit folder for a batch of image items

In this example:

Each of the folders with names like jh486mk1405 represent a single SDR item. In this case, each item will consist of a single image, the TIFF files that are inside each item's folder. Note that the manifest.csv is placed "alongside" the item folders within the larger deposit folder.

[folder containing the entire batch]/
├── jh486mk1405
│   └── ErbWE.tif
├── manifest.csv
├── mw438sy2326
│   └── AkremiF.tif
├── sv928qy8859
│   └── BerrySmithJ.tif
├── vb063xr4527
│   └── RoachJM.tif
├── vz805mb3344
│   └── ShillinglawDT.tif
├── ww805gw8199
│   └── RobertsCE.tif
└── yg789dz9935
    └── RobertsCET.tif

Using the command-line to batch organize files into folders

Some users have found the following advice helpful for automating the organization of files into folders using the Bash/Linux command line.

To prepare content from the command line on the mounted drive, create two files in the folder that contains the content folder, either by uploading them or by using a command-line text editor such as nano. The first file should be called druids.txt and contain a list of the druids being prepared, one per line. The second should be called filenames.txt and contain one druid-filename pair per line, separated by a tab.
- Both druids.txt and filenames.txt should have Unix (\n) style line endings. If the file was created in a Windows program that uses \r\n or a Mac program that uses \r, use the dos2unix command to convert the line endings to \n: dos2unix /path/to/file.txt
- To create the sub-folders: while read druid; do mkdir content/"$druid"; done <druids.txt
- To move the files from the content directory into the corresponding druid directories, substitute the file path to the folder containing the files in the following: while read druid filename; do mv /{stagingMount}/{projectFolder}/{fileFolder}/"$filename" content/"$druid"; done <filenames.txt

2. Creating the manifest file(s)

Creating the manifest.csv file

A manifest.csv file is required for all Preassembly deposits. This file has a dual purpose: it is an inventory or all items to be deposited in a single batch, and it is a device used to match druid identifiers to specific folders.

The manifest.csv file always follows the same structure:

the column headings are druid and object
the first column is a list of druids (without the "druid:" prefix
the second column is the name of the folder corresponding to the item connected with the druid on the same row

A manifest.csv for the batch of items in the staging example above would look like this:

  druid,object
  mw438sy2326,mw438sy2326
  sv928qy8859,sv928qy8859
  jh486mk1405,jh486mk1405
  vb063xr4527,vb063xr4527
  ww805gw8199,ww805gw8199
  yg789dz9935,yg789dz9935
  vz805mb3344,vz805mb3344

In this case, the folders are each named for a druid. Please note that the folder name does not necessarily have to be a druid, but it is often easier to process a Preassembly job when each folder is named for a druid, as that leaves no doubt as to which items correspond to which folders.

That said, there may be projects where folders are created for items prior to the creation of unique item identifiers (the druids). In that situation, it may be possible to leave the folder names alone rather than rename them en masse when the druids are created. This can be helpful when working with third-party vendors who may not know the druids for a set of items, for example.

The manifest.csv for a batch of items where folder names follow a non-druid pattern could look like:

  druid,object
  {druid},Pamphlet 01
  {druid},Pamphlet 02

In this example, the folders with the files in them are named "Pamphlet 01" and "Pamphlet 02". The manifest.csv tells Preassembly to look in these folders for the files matching the repository identifiers in the first column.

Creating the file_manifest.csv

As noted above, the file_manifest.csv is an additional manifest file that lists every file in a deposit, in contrast to the manifest.csv which lists only the folders and items in a deposit. The manifest.csv file is adequate for deposits that make use of Preassembly's default settings for processing simple items, such as images, books, PDF documents, and files. The manifest.csv is not adequate for more complex processing where you need more granular control over how each file is treated. In those cases, a file_manifest.csv must be included in the deposit, as this file manifest contains instructions for how each file should be processed.

Types of deposit where a file manifest is required:

Media deposits (audio and/or video) or deposits of other complex types of context (such as disk images) (detailed instructions)
Updates to existing objects where you are adding or modifying specific files (detailed instructions)
Anything else requiring customizing metadata, such as providing captions for images (detailed instructions

Detailed instructions for how to prepare file manifests for these scenarios are provided on their own wiki pages, linked from the list above or in the documentation sidebar.

Manifest formatting guidelines

Make sure each druid is listed only once
Do not leave any lines blank
Make sure to include the "druid,object" header

Using the command line to create a manifest.csv

Some users have found the following advice helpful for automating the creation of the manifest.csv using the Bash/Linux command line.

It is also possible to generate the manifest from a list of druids via the command line. With a file called druids.txt containing one druid per line in the current folder: sed 's/$^.*$$/\1,\1/' <druids.txt >>content/manifest.csv This will generate a file called manifest.csv within the content folder, but without the druid,object header, which needs to be added to the file manually.

3. Copying folders and files to a staging location

Preassembly has been set up to "get" files from certain shared storage locations, known as "staging locations". Once files are placed on a staging location, Preassembly then accesses those files and copies them into the SDR. This makes it possible to process batches of items by placing the items in a single staging location rather than having to upload them individually through the browser.

Choosing a staging location

Preassembly is integrated with multiple servers in the library, which means that you have a choice as to where to stage your content:

Departmental shared mount
- Staff in certain departments have shared file storage mounts that are connected to Preassembly. These mounts are generally accessible from staff computers (personal or shared workstations) in those departments
- The full list of these storage mounts can be found on Consul at: content mount paths
- If you are a member of one of these departments and are unsure of how to use these mounts, please contact the Repository Manager
Globus
- All staff can use Globus, which is a file transfer service managed by the library and the university
- Globus supports both browser-based uploads and file syncing through a client application
- Globus also integrates with Stanford Google Drive accounts, making it possible to deposit content from Google Drive without having to download it first
File storage on the Preassembly server itself
- Access to this server is via the command-line or SFTP only

The choice of a staging location ultimately comes down to a number of factors:

If you are a member of a department with access to a shared mount, you likely already have access to your department's staging location without needing to install or configure any additional software
If you are not a member of a department with access to a shared mount, you must use Globus or the Preassembly server
If you do not want to use SFTP or the command-line, you should use Globus
If you have content on Google Drive, you should use Globus
If no other option works best for you, you can still use the Preassembly server

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content Staging

Staging Content

1. Organizing your files

Example of a deposit folder for a batch of image items

Using the command-line to batch organize files into folders

2. Creating the manifest file(s)

Creating the manifest.csv file

Creating the file_manifest.csv

Manifest formatting guidelines

Using the command line to create a manifest.csv

3. Copying folders and files to a staging location

Choosing a staging location

Using a departmental share

Using Globus

Using the Preassembly server

Documentation

Using Preassembly

For administrators

Clone this wiki locally