Skip to content
Adam Moody edited this page Jun 25, 2013 · 9 revisions

Features:

  • long running jobs: report progress to user, continue where left off after interruption (checkpoint/restart) and provide common method to halt job
  • invoke standard linux tools where possible, e.g., grep
  • parallel techniques: master/worker, distributed queue, distributed task graph
  • define common file formats for input / output between tools

Components:

  • posix i/o wrappers to retry on non-fatal errors (e.g., EINTR)
  • component to manipulate paths (e.g., basename, dirname, transform /a/b/../c// into /a/c)
  • abstraction for file meta data (stat data) to access fields and transfer between procs
  • API to read / write file meta data structures to files
  • API to filter and sort file meta data structures
  • parallel directory walk
  • parallel pipe from one tool to another

Tools:

  • list
  • find
  • copy
  • rsync
  • remove
  • tar/zip
  • grep
  • compare
Clone this wiki locally