Scripts to help Unix Administrators and Users manage High Performance Computing (HPC) environments.
HPC systems have very large fast parallel filesystems where users can generate and use literally terrabytes of data during computation.
- "scratch" filesystems. Unfortunately, users to tend to leave files around in these filesystems rather than backing them up to long term storage such as HSM. Quotering can get around this issue, but can still results in people leaving uneeded files around and hogging space. Also, since "scratch" filesystems are typically not backed up, the practice of leaving files in "scratch" filesystems is not safe.
Given a file system expirefiles will find all files that have not been accessed in a specified number of days. It has options to warn users of files which are about to be expired (removed) via email.
Exceptions for usernames and also file paths are supported, where certain files can be exempted from a later deletion.
For more details see expirefiles.
In a typical HPC environment users login to head nodes also referred to a login nodes , from where they submit their batch jobs.
Sometimes users run CPU intensive jobs on the head nodes rather than submitting batch jobs to PBS/Torque.
The goodcitizen.sh script detects users who are running CPU intensive jobs and notifies them via email to use interactive batch jobs instead.
Other checks can be added, for example:
"watch qstat" detection - users sometimes overload the PBS/Torque scheduler by continually polling the status of their jobs with watch qstat.
For more details on configuration see goodcitizen.
Most HSM facilities using HSM storage management. This usually consists of a quota based NFS online frontend disk cache to a much larger backend offline tape component. Users copy data to the cache and the HSM offlines the data in the background.
As can be expected copying lots of small files to HSM storage is not particularly efficient. Small files are typically not big enough to be automatically moved to tape and will remain forever in the cache. This is why chunkybackup.sh was written to allow users of a HPC faility to easily "chunk up" their smaller data files.
For more details see chunkybackup.