-
Notifications
You must be signed in to change notification settings - Fork 883
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Python interpreter initialization and module import time is a significant portion of cloud-init's total runtime when the default configuration is used, and in all cases it contributes a significant amount of wall clock time to cloud-init's runtime. This commit significantly improves cloud-init time to completion by eliminating redundant interpreter starts and module loads. Since multiple cloud-init processes sit in the critical chain of the boot order, this significantly reduces cloud-init's time to ssh and time to completion. Cloud-init has four stages. Each stage starts its own Python interpreter and loads the same libraries. To eliminate the redundant work of starting an interpreter and loading libraries, this changes cloud-init to run as a single process. Systemd service ordering is retained by using the existing cloud-init services as shims which use a synchronization protocol to start each cloud-init stage and to communicate that each stage is complete to the init system. Currently only systemd is supported, but the synchronization protocol should be capable of supporting other init systems as well with minor changes. Note: This makes possible many additional improvements that eliminate redundant work. However, these potential improvements are temporarily ignored. This commit has been structured to minimize the changes required to capture the majority of primary performance savings while preserving correctness and the ability to preserve backwards compatibility. Many additional improvements will be possible once this merges. Synchronization protocol ======================== - create one Unix socket for each systemd service stage - send sd_notify() - For each of the four stages (local, network, config, final): - when init system sends "start" to the Unix socket, start the stage - when running stage is complete, send "done" to Unix socket socket.py (new) --------------- - define a systemd-notify helper function - define a context manager which implements a multi-socket synchronization protocol cloud-init-single.service (new) ------------------------------- - use service type to 'notify' - invoke cloud-init in single process mode - adopt systemd ordering requirements from cloud-init-local.service - adopt KillMode from cloud-final.service main.py ------- - Add command line flag to indicate single process mode - In this mode run each stage followed by an IPC synchronization protocol step cloud-{local,init,config,final}.services ---------------------------------- - change ExecStart to use netcat to connect to Unix socket and: - send a start message - wait for completion response - note: a pure Python equivalent is possible for any downstreams which do not package openbsd's netcat cloud-final.services -------------------- - drop KillMode cloud-init-local.services -------------------- - drop dependencies made redundant by ordering after cloud-init-single.service Performance Results =================== An integration test [1] on a Noble lxd container comparing POC to current release showed significant improvement. In the POC, cloud-config.service didn't register in the critical-chain (~340ms improvement), cloud-init.service added ~378ms to total boot time (~400ms improvement), and cloud-init-local.service had a marginal improvement (~90ms) which was likely in the threshold of noise. The total improvement in this (early stage) test showed a 0.83s improvement to total boot time with 0.66s of boot time remaining due to cloud-init. This 0.83s second improvement roughly corresponds to the total boot time, with systemd-analyze critical-chain reporting 2.267s to reach graphical.target, which is a 0.8s improvement over the current release time. Note: The numbers quoted above gathered from only one series (Noble), one platform (lxc), one host machine (Ryzen 7840U), and run environment was not controlled. I ran the test multiple times to ensure that the results were repeatable, but not enough times to be considered statistically significant. I verified that cloud-init worked as expected, but given the limited scope of this integration test, this is still very much a proof of concept. [1] test_logging.py BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.
- Loading branch information
Showing
9 changed files
with
265 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
# This file is part of cloud-init. See LICENSE file for license information. | ||
"""A module for common socket helpers.""" | ||
import logging | ||
import os | ||
import socket | ||
from contextlib import suppress | ||
|
||
from cloudinit.settings import DEFAULT_RUN_DIR | ||
|
||
LOG = logging.getLogger(__name__) | ||
|
||
|
||
def sd_notify(message: bytes): | ||
"""Send a sd_notify message.""" | ||
LOG.info("Sending sd_notify(%s)", str(message)) | ||
socket_path = os.environ.get("NOTIFY_SOCKET", "") | ||
|
||
# abstract | ||
if socket_path[0] == "@": | ||
socket_path.replace("@", "\0", 1) | ||
|
||
# unix domain | ||
elif not socket_path[0] == "/": | ||
raise OSError("Unsupported socket type") | ||
|
||
with socket.socket( | ||
socket.AF_UNIX, socket.SOCK_DGRAM | socket.SOCK_CLOEXEC | ||
) as sock: | ||
sock.connect(socket_path) | ||
sock.sendall(message) | ||
|
||
|
||
class SocketSync: | ||
"""A two way synchronization protocol over Unix domain sockets.""" | ||
|
||
def __init__(self, *names: str): | ||
"""Initialize a synchronization context. | ||
1) Ensure that the socket directory exists. | ||
2) Bind a socket for each stage. | ||
Binding the sockets on initialization allows receipt of stage | ||
"start" notifications prior to the cloud-init stage being ready to | ||
start. | ||
:param names: stage names, used as a unique identifiers | ||
""" | ||
self.stage = "" | ||
self.remote = "" | ||
self.sockets = { | ||
name: socket.socket( | ||
socket.AF_UNIX, socket.SOCK_DGRAM | socket.SOCK_CLOEXEC | ||
) | ||
for name in names | ||
} | ||
# ensure the directory exists | ||
os.makedirs(f"{DEFAULT_RUN_DIR}/share", mode=0o700, exist_ok=True) | ||
# removing stale sockets and bind | ||
for name, sock in self.sockets.items(): | ||
socket_path = f"{DEFAULT_RUN_DIR}/share/{name}.sock" | ||
with suppress(FileNotFoundError): | ||
os.remove(socket_path) | ||
sock.bind(socket_path) | ||
|
||
def __call__(self, stage: str): | ||
"""Set the stage before entering context. | ||
This enables the context manager to be initialized separately from | ||
each stage synchronization. | ||
:param stage: the name of a stage to synchronize | ||
Example: | ||
sync = SocketSync("stage 1", "stage 2"): | ||
with sync("stage 1"): | ||
pass | ||
with sync("stage 2"): | ||
pass | ||
""" | ||
self.stage = stage | ||
return self | ||
|
||
def __enter__(self): | ||
"""Wait until a message has been received on this stage's socket. | ||
Once the message has been received, enter the context. | ||
""" | ||
LOG.debug("sync(%s): initial synchronization starting", self.stage) | ||
# block until init system sends us data | ||
# the first value returned contains a message from the init system | ||
# (should be "start") | ||
# the second value contains the path to a unix socket on which to | ||
# reply, which is expected to be /path/to/{self.stage}-return.sock | ||
sock = self.sockets[self.stage] | ||
chunk, self.remote = sock.recvfrom(5) | ||
|
||
if b"start" != chunk: | ||
# The protocol expects to receive a command "start" | ||
self.__exit__(None, None, None) | ||
raise ValueError(f"Received invalid message: [{str(chunk)}]") | ||
elif f"{DEFAULT_RUN_DIR}/share/{self.stage}-return.sock" != str( | ||
self.remote | ||
): | ||
# assert that the return path is in a directory with appropriate | ||
# permissions | ||
self.__exit__(None, None, None) | ||
raise ValueError(f"Unexpected path to unix socket: {self.remote}") | ||
|
||
LOG.debug("sync(%s): initial synchronization complete", self.stage) | ||
return self | ||
|
||
def __exit__(self, exc_type, exc_val, exc_tb): | ||
"""Notify the socket that this stage is complete.""" | ||
sock = self.sockets[self.stage] | ||
sock.connect(self.remote) | ||
sock.sendall(b"done") | ||
sock.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
[Unit] | ||
Description=Cloud-init: Single Process | ||
DefaultDependencies=no | ||
Wants=network-pre.target | ||
After=systemd-remount-fs.service | ||
Before=NetworkManager.service | ||
Before=network-pre.target | ||
Before=shutdown.target | ||
Before=sysinit.target | ||
Before=cloud-init-local.service | ||
Conflicts=shutdown.target | ||
RequiresMountsFor=/var/lib/cloud | ||
ConditionPathExists=!/etc/cloud/cloud-init.disabled | ||
ConditionKernelCommandLine=!cloud-init=disabled | ||
ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled | ||
|
||
[Service] | ||
Type=notify | ||
ExecStart=/usr/bin/cloud-init --single-process | ||
KillMode=process | ||
TimeoutStartSec=infinity | ||
|
||
# Output needs to appear in instance console output | ||
StandardOutput=journal+console | ||
|
||
[Install] | ||
WantedBy=cloud-init.target |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
import socket | ||
from unittest import mock | ||
|
||
from cloudinit import socket as ci_socket | ||
|
||
|
||
class Sync: | ||
"""A device to send and receive synchronization messages | ||
Creating an instance of the device sends a b"start" | ||
""" | ||
|
||
def __init__(self, name: str, path: str): | ||
self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM) | ||
self.sock.connect(f"{path}/share/{name}.sock") | ||
self.sock.bind(f"{path}/share/{name}-return.sock") | ||
self.sock.sendall(b"start") | ||
|
||
def receive(self): | ||
"""receive 5 bytes from the socket""" | ||
received = self.sock.recv(5) | ||
self.sock.close() | ||
return received | ||
|
||
|
||
def test_single_process(tmp_path): | ||
"""Verify that a socket can store "start" messages | ||
After a socket has been been bound but before it has started listening | ||
""" | ||
expected = b"done" | ||
with mock.patch.object(ci_socket, "DEFAULT_RUN_DIR", tmp_path): | ||
sync = ci_socket.SocketSync("first", "second", "third") | ||
|
||
# send all three syncs to the sockets | ||
first = Sync("first", tmp_path) | ||
second = Sync("second", tmp_path) | ||
third = Sync("third", tmp_path) | ||
|
||
# wait on the first sync event | ||
with sync("first"): | ||
assert True | ||
# check that the first sync returned | ||
assert expected == first.receive() | ||
# wait on the second sync event | ||
with sync("second"): | ||
assert True | ||
# check that the second sync returned | ||
assert expected == second.receive() | ||
# wait on the third sync event | ||
with sync("third"): | ||
assert True | ||
# check that the third sync returned | ||
assert expected == third.receive() | ||
|
||
|
||
def test_single_process_threaded(tmp_path): | ||
# TODO demonstrate that threaded code using the same SocketSync object | ||
# will be ordered via the protocol | ||
pass |