Skip to content

ZipmakerJob failures

Johnathan Martin edited this page Aug 8, 2024 · 6 revisions

NOTE: this page is a bit stale in the wake of preservation_catalog's switch from resque-pool to Sidekiq in 2023. The logic of dump_and_return_failure_queue_entries relied on some Resque methods, but could likely be ported to Sidekiq without too much trouble. dump_and_return_failure_queue_entries just got the job arguments for everything in the given failure queue. This should also be much less necessary since we've mostly resolved preservation storage availability problems in 2022 and 2023.

The below code can help to safely automate cleanup of failed ZipmakerJobs. Since ZipmakerJob (via DruidVersionZip) now attempts to delete any bad zip files it creates, this should rarely be necessary. But for cases where the cleanup failed, and the zip file hasn't aged out of temp space, some manual cleanup may be needed.

We can build on the above generalized code for wrangling failure queue info with some code for dealing with entries in zipmaker_failed.

failure_queue_arg_lists = dump_and_return_failure_queue_entries('zipmaker_failed').map { |failure| failure[:args][0]['arguments'] }

zm_druid_version_zips = failure_queue_arg_lists.map { |args| DruidVersionZip.new(args[0], args[1], args[2]) } ; nil
bad_druid_version_zips = zm_druid_version_zips.map do |dvz|
  if File.exist?(dvz.file_path)
    if dvz.send(:zip_size_ok?)
      puts "ok: zip exists at #{dvz.file_path}, but size is ok"
      nil
    else
      puts "ERROR: zip exists at #{dvz.file_path} but is smaller than moab version"
      dvz
    end
  else
    puts "ok: no cached zip exists at #{dvz.file_path}"
    nil
  end
end.compact ; nil
# it's possible that not all entries in the failure queue will have bad zips sticking around:  for example, ZipmakerJob might've detected the moab
# was unreadable before trying to create the zip, the zip may have already aged out of temp space if the failed job is old enough, etc.

# use DruidVersionZip#cleanup_zip_parts! to handle the cleanup for us
bad_druid_version_zips.each { |bad_dvz| bad_dvz.send(:cleanup_zip_parts!) }

# now it should be fine to retry everything in queue.
# re-running the above `bad_druid_version_zips = zm_druid_version_zips.map do |dvz|...` code should result in an empty bad_druid_version_zips, since the cleanup loop should've removed any busted zip files.
Clone this wiki locally