Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve timer #5672

Merged
merged 2 commits into from
Sep 23, 2024
Merged

Improve timer #5672

merged 2 commits into from
Sep 23, 2024

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Sep 6, 2024

Note to reviewers

This looks big, but it is broken into three logically separate changes. I would recommend reviewing a commit at a time.

Motivation

The recent addition of subp execution time was a huge benefit to exposing slow external commands. As we try to push performance more, more data is vital to making the biggest impact that we can. Comparing this branch against the tip of main, this change makes the total log size 1 line longer - but this will vary depending per-platform.

  • builtin timing instrumentation in production allows diagnosing everywhere we can get a log
  • reduce noise by only logging slow events
  • avoid the need for manual instrumentation
  • cloud-init analyze is neat, but it is low granularity, relies log timestamps (so is only an approximation), and it requires extra steps
  • standardize logging to in shared code

Description of changes

  • Improve on the existing timer helper by simplifying its usage and implementation
  • Remove obsolete functionality in the existing timer function
  • Eliminate irrelevant logs by only logging if a threshold is exceeded.
  • Update existing callsites to use context manager
  • Refactor logging helpers out of util.py
  • Replace duplicate timer implementations with timer helper
    • socket.py
    • subp.py
  • Add timer helper to shared callsites
    • DataSource.get_data()
    • each cloud-config module

Todo

  • fix unittests
  • bikeshed the name (I originally implemented the context handler as a function, but now it's a class so it's suddenly un-pythonic)

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@holmanb holmanb force-pushed the holmanb/timer branch 4 times, most recently from 3169a13 to 86daea4 Compare September 7, 2024 00:08
@holmanb holmanb marked this pull request as ready for review September 7, 2024 00:25
@TheRealFalcon TheRealFalcon self-assigned this Sep 9, 2024
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Overall a really good change. I left some comments inline.

cloudinit/log/__init__.py Outdated Show resolved Hide resolved
cloudinit/log/log_util.py Outdated Show resolved Hide resolved
cloudinit/log/log_util.py Outdated Show resolved Hide resolved
cloudinit/log/log_util.py Outdated Show resolved Hide resolved
cloudinit/log/log_util.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceEc2.py Outdated Show resolved Hide resolved
cloudinit/ssh_util.py Outdated Show resolved Hide resolved
assert should_auto_attach_value == _should_auto_attach(ua_section)
assert (
Copy link
Member

@TheRealFalcon TheRealFalcon Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...I think this highlights a gap in functionality where we might wind up losing valuable logging messages that no longer take as much time to execute. Looking at the Pro module now, there's now no log anywhere to indicate that we're attempting to attach to Pro unless it is taking too long, and I'm assuming that's not what we want. I'm not sure I have a great solution other than looking through all of the changed call sites and using a threshold of 0 for those we think we want to log regardless of the time taken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I reworked the context manager to support a parameter called log_mode, with three options: "always", "threshold", and "skip" (default is "threshold"). I audited the modified callsites and set "always" to those that I think need it. Let me know if there are any others that you think should also be modified.

@holmanb
Copy link
Member Author

holmanb commented Sep 19, 2024

@TheRealFalcon thanks for the review. I think that I've addressed all of your comments. Ready for re-review.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few more comments, but everything is mostly minor.

I really like the new decorator.

cloudinit/cmd/main.py Outdated Show resolved Hide resolved
cloudinit/cmd/main.py Show resolved Hide resolved
cloudinit/config/cc_resizefs.py Show resolved Hide resolved
def __init__(
self,
msg: str,
/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to enforce this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a standard calling convention makes code more recognizable but I don't feel strongly about this. I recall it also made refactoring slightly more difficult that one of the calls used the form msg="", but I can drop it if you'd rather not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to allow kwargs here. I think the main case for positional only arguments (other than c api compatibility) is when the arguments are obvious and names really convey no extra meaning. E.g., min(5, 4) wouldn't benefit by having kwargs.

However, if we make a call like Timed(x, threshold=y, log_mode=z), there's really no context to tell us what we're actually passing as the first argument. Obviously we should have a better variable name than x, but it seems needlessly restricting to disallow the caller to add the context if it'd make the call clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheRealFalcon that seems reasonable, I just removed that restriction

cloudinit/config/modules.py Outdated Show resolved Hide resolved
cloudinit/atomic_helper.py Show resolved Hide resolved
cloudinit/safeyaml.py Outdated Show resolved Hide resolved
cloudinit/sources/__init__.py Outdated Show resolved Hide resolved
cloudinit/performance.py Show resolved Hide resolved
@holmanb holmanb force-pushed the holmanb/timer branch 3 times, most recently from 9f89d20 to f713d5c Compare September 23, 2024 15:27
@holmanb
Copy link
Member Author

holmanb commented Sep 23, 2024

@TheRealFalcon I think that I've addressed all of your comments. Ready for re-review (assuming tests pass).

@TheRealFalcon
Copy link
Member

@holmanb , See my latest comments on the last two unresolved conversations. +1 to everything else.

@holmanb
Copy link
Member Author

holmanb commented Sep 23, 2024

@TheRealFalcon I just pushed fixes for your last two comments

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

The logic in Timed() has defaults which are designed to avoid low-value log
messages. A log is only emitted when the threshold is exceeded. Additionally,
the log_mode parameter allows the caller to prevent Timed() logs, so that the
caller may coalesce the message into another log.

- Make timer helper into a context manager.
- Add new decorator for timing functions.
- Simplify timer helper function, eliminate obsolete logic.
- Eliminate redundant logs and timer calls.
refactors existing code to utilize timer codepaths:

- replace manual timer implementations in subp.py and sockets.py
- replace open() / read() calls with util.load_text_file() where appropriate
@holmanb holmanb merged commit 70d4a5c into canonical:main Sep 23, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants