-
Notifications
You must be signed in to change notification settings - Fork 2
Job queues
A lengthy backlog for "validate_moab" is problematic. This indicates that there is an issue which is blocking accessioning and should be investigated.
A lengthy backlog for other other queues is to be expected. This is because a large number of jobs may be added to the queue at the same time. (See schedule.rb. However, Sidekiq should make steady progress on a lengthy backlog, even if completing the backlog takes multiple days.
Go to the worker VMs, and run ps -ef | grep kiq
. If you notice any stale worker management processes, kill them so that only current code is executing. Deployments, on rare occasion, are unsuccessful at rotating old worker processes for new ones, and the old worker processes with out of date code will pick up jobs.
PresCat uses an approach to Sidekiq configuration that provides fine control over workers, threads, and queues:
- The number of worker processes for a server is specified in that server's puppet configuration.
- Each worker process has its own Sidekiq configuration file, named with the number of the worker. For example, if there are 2 worker processes, then there will be a
sidekiq1.yml
and asidekiq2.yml
. - Sidekiq configuration files are stored in shared_configs.
- Each Sidekiq configuration file specifies the number of threads and the queues that are serviced. For example:
---
:concurrency: 3
:queues:
- validate_moab
- Each Sidekiq worker process has multiple worker threads (
concurrency
) for working the designatedqueues
.- A worker thread is what actually picks up a job from a queue to perform the work. (As opposed to e.g. Resque, where a worker management process coordinates many separate worker processes, with each Resque worker process picking up jobs from a designated group of queues)
It should be expected to adjust Sidekiq configuration over time based on usage.
- Replication errors
- Validate moab step fails during preservationIngestWF
- ZipmakerJob failures
- Moab Audit Failures
- Ceph Errors
- Job queues
- Deposit bag was missing
- ActiveRecord and Replication intro
- 2018 Work Cycle Documentation
- Fixing a stuck Moab
- Adding a new cloud provider
- Audits (how to run as needed)
- Extracting segmented zipfiles
- AWS credentials, S3 configuration
- Zip Creation
- Storage Migration Additional Information
- Useful ActiveRecord queries
- IO against Ceph backed preservation storage is hanging indefinitely (steps to address IO problems, and follow on cleanup)