Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell/oom: log more detail when tasks are killed #6289

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

garlick
Copy link
Member

@garlick garlick commented Sep 18, 2024

Problem: when a job exceeds the memory cgroup limit, it may not be immediately obvious what the limit is or what host it occurred on.

Include the hostname with the existing "out of memory" message. Log the value of memory.peak (the peak memory usage).

New output example

29.475s: flux-shell[0]: ERROR: oom: Memory cgroup out of memory: killed 1 task on picl7.
29.563s: flux-shell[0]: ERROR: oom: memory.peak = 1.4428558G

Problem: when a job exceeds the memory cgroup limit, it may
not be obvious what the limit is or what host it occurred on.

Include the hostname with the existing "out of memory" message.
Log the value of memory.peak (the peak memory usage).
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@garlick
Copy link
Member Author

garlick commented Sep 18, 2024

Thanks - I'll set MWP.

@mergify mergify bot merged commit fa81b09 into flux-framework:master Sep 18, 2024
32 of 33 checks passed
Copy link

codecov bot commented Sep 18, 2024

Codecov Report

Attention: Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.

Project coverage is 83.33%. Comparing base (0575f47) to head (85b2ac1).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/shell/oom.c 0.00% 22 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6289      +/-   ##
==========================================
- Coverage   83.34%   83.33%   -0.02%     
==========================================
  Files         522      522              
  Lines       86036    86057      +21     
==========================================
+ Hits        71710    71717       +7     
- Misses      14326    14340      +14     
Files with missing lines Coverage Δ
src/shell/oom.c 39.71% <0.00%> (-6.96%) ⬇️

... and 8 files with indirect coverage changes

@garlick garlick deleted the oom_peak branch September 19, 2024 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants