This repository hosts a dracut module for creating a local overlayFS that provides persistent storage for a Live image.
The created overlayFS is customizable, by default the overlayFS is built on a mirrored MD RAID.
This module assumes authority and will wipe and partition a server from within the initramFS.
Additionally, the partitioned RAID array will contain an empty LVM, this LVM is useful for allowing processes such as cloud-init to further customize once the server boots.
For more information on Dracut, the initramFS creation tool, see here: https://github.com/dracutdevs/dracut
In order to use this dracut module, you need:
-
A local-attached-usb or remote server with a squashFS image.
-
Physical block devices must be installed in your blade(s).
-
Two physical disk of 0.5TiB or less (or the RAID must be overridden, see metal mdsquash customization
Specify a local disk device for storing images, and the URL/endpoint to fetch images from on the kernel commandline (e.g. via grub.cfg
, or an iPXE script).
metal.server=<URI> root=live:LABEL=SQFSRAID
The above snippet is the minimal cmdline necessary for this module to function. Additional options are denoted throughout the module customization section.
The URI scheme, authority, and path values will tell the module where to look for SquashFS. Only those parts of a URI are supported.
-
file driver
# Example: Load from another USB stick on the node. file://mydir?LABEL=MYUSB # Example: load a file from the root of an attached disk with the label SQFSRAID file://?LABEL=SQFSRAID # Example: load a file from a PIT recovery USB. file://ephemeral/data/ceph/?LABEL=PITDATA
-
http or https driver
http://api-gw-service/path/to/service/endpoint http://pit/ http://10.100.101.111/some/local/server
Other drivers, such as native s3
, scp
, and ftp
could be added but are not currently implemented.
These drivers schemes are all defined by the rule generator, metal-genrules.sh
.
Set metal.debug=1
to enable debug output from only metal modules.This will verbosely print the creation of the RAIDs and fetching of the squashFS image. This effectively runs all dracut-metal code with set -x
, while leaving the rest of dracut to its own debug level.
-
Default: 0
Specify the number of disks to use in the local RAID (see metal.md-level
for changing the RAID type).
-
Default: 2
Change the level passed to mdadm for RAID creation, possible values are any value it takes. Milaege varies, buyer beware this could dig a hole deeper.
-
Default: mirror
Note
|
When metal.disks=1 is set, a RAID array is still created but with only one member.
In this case, only mirror and stripe will produce
|
Determines if the wipe function should run, metal.no-wipe=0
will wipe block devices and make them ready for partitioning. metal.no-wipe=1
will disable this behavior.
-
Default: 0
off
to avoid a wipe. This timeout can be adjusted, see metal.wipe-delay
.The following storage items are removed and/or prepared for partitioning as a raw disk:
-
LVMs (specifically
'vg_name=~ceph*' and 'vg_name=~metal*'
)-
This removes any CEPH volumes
-
Any volume prefixed with
metal
is considered a relative to this module and will be removed-
Volumes are removed with
vgremove
-
-
-
/dev/md
devices-
MD Devices are stopped
-
Magic bits erased
-
Each memeber’s superblocks are zeroed
-
-
/dev/sd
and/dev/nvme
devices-
Magic bits erased
-
-
Any/all USB devices are ignored
-
Any/all devices smaller than
metal.min-disk-size*1024**3 bytes
is ignored (seemetal.min-disk-size
) -
partprobe
is invoked to update/notify the kernel of the partition changes -
Any LVMs that weren’t on a device that was wiped will still exist, since only specific LVMs are targeted
Warning: local storage device wipe [ safeguard: DISABLED ]
Warning: local storage devices WILL be wiped (https://github.com/Cray-HPE/dracut-metal-mdsquash/tree/7d303b3193619f642b1316ce2b1968ee1cc82a69#metalno-wipe)
Warning: local storage device wipe commencing ...
Warning: local storage device wipe ignores USB devices and block devices less then or equal to [17179869184] bytes.
Warning: nothing can be done to stop this except one one thing ...
Warning: power this node off within the next [5] seconds to cancel.
Warning: NOTE: this delay can be adjusted, see: https://github.com/Cray-HPE/dracut-metal-mdsquash/tree/7d303b3193619f642b1316ce2b1968ee1cc82a69#metalwipe-delay)
Found volume group "metalvg0" using metadata type lvm2
Found volume group "ceph-ec4a2c46-e0ab-4f89-b7dc-6c044ce9a24b" using metadata type lvm2
Found volume group "ceph-2c5c9402-7bc2-4a8c-8eba-028532b91d9f" using metadata type lvm2
Found volume group "ceph-a38bb9f7-99ef-4536-82cf-2550a406da38" using metadata type lvm2
Found volume group "ceph-c1e6018e-6a50-4b17-a15d-b387ae66b8a4" using metadata type lvm2
VG #PV #LV #SN Attr VSize VFree
ceph-2c5c9402-7bc2-4a8c-8eba-028532b91d9f 1 1 0 wz--n- <1.75t 0
ceph-a38bb9f7-99ef-4536-82cf-2550a406da38 1 1 0 wz--n- <1.75t 0
ceph-c1e6018e-6a50-4b17-a15d-b387ae66b8a4 1 1 0 wz--n- <447.13g 0
ceph-ec4a2c46-e0ab-4f89-b7dc-6c044ce9a24b 1 1 0 wz--n- <1.75t 0
metalvg0 1 3 0 wz--n- 279.14g 149.14g
Warning: removing all volume groups of name [vg_name=~ceph*]
Failed to clear hint file.
Logical volume "osd-block-a8c05059-d921-4546-884d-f63f606f966c" successfully removed
Volume group "ceph-ec4a2c46-e0ab-4f89-b7dc-6c044ce9a24b" successfully removed
Logical volume "osd-block-d70a9ddd-9b8c-42e0-98cb-5f5279dcef5a" successfully removed
Volume group "ceph-2c5c9402-7bc2-4a8c-8eba-028532b91d9f" successfully removed
Logical volume "osd-block-d2e9e4cf-c670-418f-847e-39ade3208d04" successfully removed
Volume group "ceph-a38bb9f7-99ef-4536-82cf-2550a406da38" successfully removed
Logical volume "osd-block-b6085667-54dc-4e01-810b-25c093a510dc" successfully removed
Volume group "ceph-c1e6018e-6a50-4b17-a15d-b387ae66b8a4" successfully removed
Warning: removing all volume groups of name [vg_name=~metal*]
Failed to clear hint file.
Logical volume "CEPHETC" successfully removed
Logical volume "CEPHVAR" successfully removed
Logical volume "CONTAIN" successfully removed
Volume group "metalvg0" successfully removed
Warning: local storage device wipe is targeting the following RAID(s): [/dev/md124 /dev/md125 /dev/md126 /dev/md127]
Warning: local storage device wipe is targeting the following block devices: [/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf]
Warning: local storage disk wipe complete
Found the following disks for the main RAID array (qty. [2]): [sda sdb]
mdadm: size set to 487360K
mdadm: array /dev/md/BOOT started.
mdadm: size set to 23908352K
mdadm: array /dev/md/SQFS started.
mdadm: size set to 146352128K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: array /dev/md/ROOT started.
mdadm: chunk size defaults to 512K
mdadm: array /dev/md/AUX started.
The number of seconds that the wipe function will wait to allow an administrator to cancel it (by powering the node off).See the source code in metal-md-lib.sh
for minimum and maximum values.
-
Default: 5
-
Unit: Seconds
By default, metal-dracut will use IPv4 to resolve the deployment server for the initial call-to-home and when downloading artifacts regardless if IPv6 networking is present in the environment. This is to safeguard against fault/misconfigured IPv6 environments.
To disable this constraint, simply set metal.ipv4=0
in the cmdline. Setting 0
will
enable IPv6 for this module.
-
Default: 1
Set the size for the new SQFS partition. Buyer beware this does not resize, this applies for new partitions.
-
Default: 25
-
Unit: Gigabytes
Set the size for the new SQFS partition. Buyer beware this does not resize, this applies for new partitions.
-
Default: 150
-
Unit: Gigabytes
Set the size for the new SQFS partition. Buyer beware this does not resize, this applies for new partitions.
-
Default: 150
-
Unit: Gigabytes
reference: dracut dmsquashlive cmdline
Name of the directory store and load the artifacts from. Changing this value will affect metal and native-dracut.
-
Default: LiveOS
Specify the FSlabel of the block device to use for the SQFS storage. This could be an existing RAID or non-RAIDed device.
If a label is not found in /dev/disk/by-label/*
, then the os-disks are paved with a new mirror array.
Can also be of UUID or
-
Default: live:LABEL=SQFSRAID
Specify the FSlabel of the block device to use for persistent storage.
If a label is not found in /dev/disk/by-label/*
, then the os-disks are paved.
If this is specified, then rd.live.overlay=LABEL=<new_label_here>
must also be specified.
-
Default: LABEL=ROOTRAID
Reset the persistent overlayFS, regardless if it is read-only. On the next boot the overlayFS will clear itself, it will continue to clear itself every reboot until this is unset. This does not remake the RAID, this remakes the OverlayFS. Metal only provides the underlying array, and the parent directory structure necessary for an OverlayFS to detect the array as compatible.
-
Default: 0
reference: dracut standard cmdline
The idea of persistence is that changes persist across reboots, when the state of the machine changes it preserves information. For servers that boot images into memory (also known as live images), an overlayFS is a common method for providing persistent storage.
The overlayFS created by this dracut module is used by the dmsquash-live module, all dracut live image kernel parameters should function alongside this module.