-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: single process optimization #5489
Commits on Aug 2, 2024
-
feat: Single process optimization
Python interpreter initialization and module import time is a significant portion of cloud-init's total runtime when the default configuration is used, and in all cases it contributes a significant amount of wall clock time to cloud-init's runtime. This commit significantly improves cloud-init time to completion by eliminating redundant interpreter starts and module loads. Since multiple cloud-init processes sit in the critical chain of the boot order, this significantly reduces cloud-init's time to ssh and time to completion. Cloud-init has four stages. Each stage starts its own Python interpreter and loads the same libraries. To eliminate the redundant work of starting an interpreter and loading libraries, this changes cloud-init to run as a single process. Systemd service ordering is retained by using the existing cloud-init services as shims which use a synchronization protocol to start each cloud-init stage and to communicate that each stage is complete to the init system. Currently only systemd is supported, but the synchronization protocol should be capable of supporting other init systems as well with minor changes. Note: This makes possible many additional improvements that eliminate redundant work. However, these potential improvements are temporarily ignored. This commit has been structured to minimize the changes required to capture the majority of primary performance savings while preserving correctness and the ability to preserve backwards compatibility. Many additional improvements will be possible once this merges. Synchronization protocol ======================== - create one Unix socket for each systemd service stage - send sd_notify() - For each of the four stages (local, network, config, final): - when init system sends "start" to the Unix socket, start the stage - when running stage is complete, send "done" to Unix socket socket.py (new) --------------- - define a systemd-notify helper function - define a context manager which implements a multi-socket synchronization protocol cloud-init-single.service (new) ------------------------------- - use service type to 'notify' - invoke cloud-init in single process mode - adopt systemd ordering requirements from cloud-init-local.service - adopt KillMode from cloud-final.service main.py ------- - Add command line flag to indicate single process mode - In this mode run each stage followed by an IPC synchronization protocol step cloud-{local,init,config,final}.services ---------------------------------- - change ExecStart to use netcat to connect to Unix socket and: - send a start message - wait for completion response - note: a pure Python equivalent is possible for any downstreams which do not package openbsd's netcat cloud-final.services -------------------- - drop KillMode cloud-init-local.services -------------------- - drop dependencies made redundant by ordering after cloud-init-single.service Performance Results =================== An integration test [1] on a Noble lxd container comparing POC to current release showed significant improvement. In the POC, cloud-config.service didn't register in the critical-chain (~340ms improvement), cloud-init.service added ~378ms to total boot time (~400ms improvement), and cloud-init-local.service had a marginal improvement (~90ms) which was likely in the threshold of noise. The total improvement in this (early stage) test showed a 0.83s improvement to total boot time with 0.66s of boot time remaining due to cloud-init. This 0.83s second improvement roughly corresponds to the total boot time, with systemd-analyze critical-chain reporting 2.267s to reach graphical.target, which is a 0.8s improvement over the current release time. Note: The numbers quoted above gathered from only one series (Noble), one platform (lxc), one host machine (Ryzen 7840U), and run environment was not controlled. I ran the test multiple times to ensure that the results were repeatable, but not enough times to be considered statistically significant. I verified that cloud-init worked as expected, but given the limited scope of this integration test, this is still very much a proof of concept. [1] test_logging.py BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.
Configuration menu - View commit details
-
Copy full SHA for f7ccda9 - Browse repository at this point
Copy the full SHA f7ccda9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 79e191f - Browse repository at this point
Copy the full SHA 79e191fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7a62897 - Browse repository at this point
Copy the full SHA 7a62897View commit details -
Configuration menu - View commit details
-
Copy full SHA for c127fca - Browse repository at this point
Copy the full SHA c127fcaView commit details -
Rename cloud-init services to be more intuitive.
Make cloud-network.service map to the cloud-init network stage. Make cloud-init.service map to all of cloud-init. BREAKING CHANGE: Changes the semantics of the cloud-init.service files
Configuration menu - View commit details
-
Copy full SHA for 3247c11 - Browse repository at this point
Copy the full SHA 3247c11View commit details -
Configuration menu - View commit details
-
Copy full SHA for d87be7c - Browse repository at this point
Copy the full SHA d87be7cView commit details -
Improve intra-stage error handling
- make it such that if one stage fails, the next stage isn't blocked indefinitely - notify the init system of per-stage exit codes and failure messages - make parent process (cloud-init.service) exit with representative exit code
Configuration menu - View commit details
-
Copy full SHA for 18aa5b3 - Browse repository at this point
Copy the full SHA 18aa5b3View commit details -
Do not set up logger multiple times
Add a new attribute flag to the argparser Namespace attribute which is used to disable logging. This isn't elegant, but fixing logging is going to be a large refactor so this gets logging "working" for now while minimizing number of LOC changed
Configuration menu - View commit details
-
Copy full SHA for 5c05690 - Browse repository at this point
Copy the full SHA 5c05690View commit details -
fix commandline (for debugger use)
skips sync protocol when stdin is a tty
Configuration menu - View commit details
-
Copy full SHA for 14ca37f - Browse repository at this point
Copy the full SHA 14ca37fView commit details -
Configuration menu - View commit details
-
Copy full SHA for c2079ea - Browse repository at this point
Copy the full SHA c2079eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for f0944d0 - Browse repository at this point
Copy the full SHA f0944d0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 417d550 - Browse repository at this point
Copy the full SHA 417d550View commit details -
Configuration menu - View commit details
-
Copy full SHA for cef0f5e - Browse repository at this point
Copy the full SHA cef0f5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7d13021 - Browse repository at this point
Copy the full SHA 7d13021View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0848734 - Browse repository at this point
Copy the full SHA 0848734View commit details -
Configuration menu - View commit details
-
Copy full SHA for 212f841 - Browse repository at this point
Copy the full SHA 212f841View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5089e59 - Browse repository at this point
Copy the full SHA 5089e59View commit details -
- remove logs duplicated across stages - send the single line traceback to systemd - fix a minor string format in user output
Configuration menu - View commit details
-
Copy full SHA for a053c19 - Browse repository at this point
Copy the full SHA a053c19View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9ca7a97 - Browse repository at this point
Copy the full SHA 9ca7a97View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7df7d83 - Browse repository at this point
Copy the full SHA 7df7d83View commit details -
Configuration menu - View commit details
-
Copy full SHA for cb7bb25 - Browse repository at this point
Copy the full SHA cb7bb25View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4b7dbbb - Browse repository at this point
Copy the full SHA 4b7dbbbView commit details