Skip to content

Architecture

Eric Lopatin edited this page Jun 23, 2021 · 15 revisions

Table of Contents

Merritt's architecture consists of several primary microservices, as well as various off-the-shelf components, secondary services, and external systems, in addition to multiple remote storage providers. An architecture diagram is provided at the end of this document.

Primary microservices

Merritt consists of five primary microservices: Ingest, Storage, Inventory, Replication, and Audit.

  • The Ingest service manages the acquisition of new digital content.
  • The Storage service manages the secure and persistent storage of digital content.
  • The Inventory service provides a comprehensive catalog of information known about digital objects, versions, collections, and owners.
  • The Replication service manages the synchronization of content replicas across redundant storage locations.
  • The Audit service manages the ongoing bit-level verification of digital content.

Each service is deployed on multiple servers for performance and fault-tolerance.

🔰Note: For an overview of how the different services work together to ingest, store, replicate and audit content, see the Ingest Process and Dataflow pages.

Off-the-shelf components

Merritt uses several off-the-shelf components to support, coordinate, and share data among the various primary services. These include an OpenDJ LDAP server for authentication and authorization, a ZooKeeper queue for processing ingested content into the Inventory service, and a MySQL database maintaining inventory, audit, and replication information.

In addition, an Apache web server (not shown in the diagram) acts as front end and load balancer for the UI. Internal HTTP requests among the various services are load-balanced via either the same Apache server, or an Amazon Application Load Balancer (ALB). We are in the process of moving all internal services off Apache and onto Amazon ALBs.

Secondary services

Several secondary services facilitate access to the digital content stored in Merritt, including the Local ID service, SWORD server, OAI-PMH server and UI.

  • The Local ID maps external secondary ("local") identifiers such as DOIs to the ARKs used as Merritt primary identifiers.
  • The SWORD server implements a subset of the SWORD 2.0 deposit specification, and is used to accept deposits from external systems such as Dash.
  • The OAI-PMH server provides an OAI-PMH feed allowing external systems to harvest metadata from Merritt collections.
  • The UI is a Ruby on Rails application that provides the primary user interface to Merritt.

Remote storage

All digital content deposited in Merritt is written to remote external storage for preservation, either in Amazon S3, at the San Diego Supercomputer Center, or at Wasabi. The primary copies are then replicated to secondary storage locations for redundancy.

(Content in S3, at SDSC and Wasabi is transparently replicated on multiple servers, providing additional redundancy.)

Diagram

Merritt Microservice Architecture