Typically when consolidating environments, or moving standalone systems into a consolidated directory existing users and groups with conflicting UIDs and GIDs have to be renumbered. Part of this process means you have to safely and efficiently modify the file systems and bring the AS-IS situation into line with the wanted TO-BE end-result.
This is exactly what renumid intends to help with.
Intuitively one would scan the filesystems for a specific UID or GID, once identified you’d change the ownership to the new state, continue with the next files until all files, and all UIDs/GIDs are dealt with. Often the UNIX 'find' utility is used for this like:
find / -uid 1234 -exec chown 3456 {} \; find / -uid 3456 -exec chown 5678 {} \;
However this poses a few problems:
-
You will need to scan the whole filesystem for each UID/GID to process. This takes a lot of time and is far from efficient
-
You risk ending up with the wrong end-result. If you are not careful (like the above example) you may be renumbering already mapped files in a subsequent action. Creating a situation that you cannot restore easily.
-
Changes that require both UID and GID changes cannot be easily dealt with.
-
If the process (for whatever reason is interrupted) replaying the remaining actions is tedious and time-consuming.
The solution to this is to use a two-phased approach.
-
First index the file system and identify all paths that require a change.
-
Use this Index file to efficiently perform the needed renumbering.
Since this index not just stores the paths, but also the original state. One can extract exactly the intended end-result, and the required work to get there. (e.g. the number of atomic file system actions to perform)
A two phased approach also allows to prepare the required actions before the migration itself takes place, and can help to assess the required migration time for individual systems. As such file system renumbering can potentially be very time-consuming one can much better predict the required downtime needed for the migration.
For this reason we added 2 optional steps.
-
A step that can report on what changes are to be expected from the Index file
-
A step that can restore the original state or can be used to benchmark the process (without impact)
The following potential issues should be considered:
-
Symlinks should not be followed (no-dereference) - DONE
-
Pseudo file systems should be excluded - DONE
-
Crossing file systems boundaries may not be wanted
-
File systems are mounted inside file systems (so device nr is important) - DONE
-
Python os.walk provides every directory as a (new) root, so we need to store excluded device nrs - DONE
-
Bind mounts need to be handled carefully (keep a list of scanned device nrs ?)
renumid index - Create a filesystem index of impacted paths using a map renumid status - Show a status report of impacted paths and affected UIDs/GIDs renumid renumber - Renumber the impacted paths according to the provided map renumid restore - Restore the original situation using the filesystem index
-f <index> - the index file to create/use (default: renumid-<date>-<time>.idx) -m <map> - the map file to use (default: renumid.map) -d - enable debugging output -v - be more verbose -x - do not cross filesystem boundaries -T - include only these file system types (eg. ext3,ext4,xfs)
The map is created as YAML or JSON file.
Below is an example file showing how it can be used:
--- # The below UIDs/GIDs are examples to use on a RHEL /dev and /tmp file system # These maps would match for all hosts uidmap: 10: 10010 # uucp 42: 10042 # gdm gidmap: 5: 10005 # tty 16: 10016 # oprofile 39: 10039 # video # But you can define per-host maps, or use wildcards on FQDN moria*: uidmap: 48: 10048 # apache 69: 10069 # vcsa gidmap: 42: 10042 # gdm 69: 10069 # vcsa 484: 10484 # tmux 505: 10505 # vboxusers lisse.gent.wieers.com: uidmap: 8: 10008 # mail gidmap: 8: 10008 # mail
The file system index will be a pickle file with a specific renumid data structure.
Or could become something more efficient (storage and speed-wise).
The file system index stores the following information:
-
A version number of the file format (to detect incompatible files)
-
A set of excluded devices (so it is clear what the scope is)
-
Statistics (time, duration, paths scanned, …)
-
The time it took to create the file system index
-
The scope (parents) that require changes
-
The actual mapping used for the index
-
The list of paths and original uids/gids