Profile mem #678

kba · 2021-03-04T15:38:55Z

When running processors with ocrd process, we already measure the wall and CPU time of a processor run. This adds basic tracking of memory usage with https://github.com/pythonprofilers/memory_profiler.

Currently, this just outputs a sparkline of memory usage and max/min memory, e.g.

16:35:54.593 INFO ocrd.process.profile - memory consumption: ▁▁▂▅▅▃▄▄▄▄▄▄▄▄▄▄▅▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆██▅▅▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▄▄
max: 300.79 MiB min: 133.09 MiB

bertsky

Excellent!

Not sure how to add that, but memory_profiler is based on psutil, which would also give us .cpu_times()['iowait'] or .disk_io_counters()['read_time'] / write_time / busy_time etc...

bertsky · 2021-03-04T16:12:24Z

ocrd/ocrd/processor/helpers.py

@@ -69,7 +71,13 @@ def run_processor(
    log.debug("Processor instance %s (%s doing %s)", processor, name, otherrole)
    t0_wall = perf_counter()
    t0_cpu = process_time()
-    processor.process()
+    mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True)


Maybe you wanna add multiprocess=True in there for processors that use multiprocessing or similar, or that delegate to shell calls. Perhaps also include_children=True (not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).

Maybe you wanna add multiprocess=True in there for processors that use multiprocessing or similar, or that delegate to shell calls.

I will.

Perhaps also include_children=True (not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).

if include_children: mem += sum(_get_child_memory(process, meminfo_attr))

https://github.com/pythonprofilers/memory_profiler/blob/master/memory_profiler.py#L135-L136

Perhaps also include_children=True (not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).

if include_children: mem += sum(_get_child_memory(process, meminfo_attr))

I saw that, but the question remains: what is in that sum? (We probably don't want to count shared and CoW pages repeatedly. But it may be hard to calculate exactly. IIRC standard ps and top and even time do count repetitions, so they appear too large – don't "add up" – for multiprocessed programs, whereas smem gets it right. This tells me we are actually interested in the PSS, the proportional set size, rather than the RSS.)

Ok, but PSS vs RSS is a more general problem/discussion. Any multiprocessing coder will know about that. And RSS is still a valuable information. So let's stick to your current implementation.

bertsky · 2021-03-04T16:17:49Z

This will conflict with #652 (where that part of run_processor is delegated to the new run_api) – not sure what's the least painful way for merging.

Oh, since I saw this also uses multiprocessing, have you measured any CPU overhead for the memory profiling yet?

kba · 2021-03-04T16:24:34Z

This will conflict with #652 (where that part of run_processor is delegated to the new run_api) – not sure what's the least painful way for merging.

Is that just an issue of resolving conflicts or do you mean there's an incompatible API-wise now?

Oh, since I saw this also uses multiprocessing, have you measured any CPU overhead for the memory profiling yet?

I haven't, yet. I did notice that when I profile the current process (proc=-1) instead of the processor.process() call only, then either it just offers a single frame of measurement (max_iter=1) or will hang until the timeout is reached (timeout=100)

bertsky · 2021-03-04T16:48:52Z

This will conflict with #652 (where that part of run_processor is delegated to the new run_api) – not sure what's the least painful way for merging.

Is that just an issue of resolving conflicts or do you mean there's an incompatible API-wise now?

It looks like this is only cosmetic (since the workflow server does not handle the retval either, it just adds exception handling, and does not use multiprocessing itself, at least in the current implementation). But I would have to check/test.

bertsky · 2021-09-01T13:16:13Z

ocrd/ocrd/processor/helpers.py

@@ -69,7 +71,12 @@ def run_processor(
    log.debug("Processor instance %s (%s doing %s)", processor, name, otherrole)
    t0_wall = perf_counter()
    t0_cpu = process_time()
-    processor.process()
+    mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True)


Suggested change

mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True)

mem_usage = memory_usage(proc=processor.process,

interval=.1, timeout=None, timestamps=True,

# include sub-processes

multiprocess=True, include_children=True,

# get proportional set size instead of RSS

backend='psutil_pss')

If it is really that simple – let's make this configurable (PSS vs RSS measuring). Did you try this?

Sure, however we first need a mechanism to toggle this behavior. Simple idea would be to add CLI flags --mem-profile-rss and --mem-profile-pss.

@bertsky prefers a way to have this configurable via the logging configuration - @M3ssman do you have a good idea how that might work? That way, we could keep the CLI arguments small and the --help output less confusing.

Perhaps for the moment it could be as simple as

mem_prof = os.getenv('OCRD_PROFILE', '') if mem_prof: ... backend='psutil_pss' if mem_prof == 'PSS' else 'psutil')

I like environment variables in general, but when the choice is between environment variables and CLI flags, I prefer the latter, because at least it's consistent. Because otherwise you could get OCRD_PROFILE=PSS ocrd-dummy --profile-file my.prof ....

@kba I agree, however I did not argue env var plus cli arg, but only the former. Perhaps we could also move --profile and --profile-file into env var control entirely.

Or we go the other direction and put everything into a single CLI arg, just with multi-faceted value: --profile mem=rss,cpu,file=foo.log for example.

For the broader discussion of how to configure processors without adding to many options / parameters and what for, see this old thread

OK, now that the wonky multiple-process-call behavior is fixed, I'll implement this via environment variable OCRD_PROFILE and OCRD_PROFILE_FILE

OCRD_PROFILE not set or empty: no profiling

OCRD_PROFILE_FILE set to a value (== current --profile-file flag):

if OCRD_PROFILE is defined but empty: no profiling

if OCRD_PROFILE is not defined: set OCRD_PROFILE to CPU
* if OCRD_PROFILE is defined but does not contain CPU: raise error too strict, PROFILE_FILE should imply PROFILE^=CPU

OCRD_PROFILE contains CPU, do CPU profiling (== current --profile flag)

OCRD_PROFILE contains RSS do RSS memory profiling

OCRD_PROFILE contains PSS: do proportionate mem profiling

~~If OCRD_PROFILE contains both PSS and RSS: raise error~~ too strict, PSS takes precedence

Did I catch everything?

Implemented in 4fdfbf3

Environment variables look good to me.

In the readme, I would also suggest to reference an external link where the difference between RSS and PSS is explained or at least briefly described. May not be the best suggestion, but consider this.

yes, sounds reasonable and complete AFAICS – thx!

bertsky · 2021-09-01T14:42:43Z

I wonder if it would be much effort to incorporate gputil for (avg/peak) GPU memory usage statistics. (Or perhaps one could add it to memory_profiler as an alternative backend?)

bertsky · 2022-10-12T16:04:51Z

I wonder if it would be much effort to incorporate gputil for (avg/peak) GPU memory usage statistics. (Or perhaps one could add it to memory_profiler as an alternative backend?)

I'm afraid it's not that simple at all. GPutil only gives snapshots, no aggregation/statistics there. And it looks like one would need to use the respective ML framework's facilities for process/session measurements anyway. (See here and there for Pytorch. TF now also has things like get_memory_info / get_memory_usage / reset_memory_stats.)

Let's treat that off-topic for now!

bertsky · 2022-10-12T16:06:20Z

@kba we could really use that prior to #875. Would you consider fast-tracking this?

kba · 2022-11-22T17:22:57Z

@kba we could really use that prior to #875. Would you consider fast-tracking this?

Sure. It will be temporarily inconsistent with CLI flags vs environment variables to configure behaviors and we should definitely continue the discussion on configuration options.

ocrd/ocrd/processor/helpers.py

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>

…emory profiling

bertsky · 2022-11-23T12:55:07Z

README.md

+  * `RSS`: Enable RSS memory profiling
+  * `PSS`: Enable proportionate memory profiling
+* `OCRD_PROFILE_FILE`: If set, then the CPU profile is written to this file for later peruse with a analysis tools like [snakeviz](https://jiffyclub.github.io/snakeviz/)
+


Suggested change

Logging is configured via configuration files, see [module logging](./ocrd_utils#ocr-d-module-logging).

ocrd process: simple memory usage measurement

3449cf7

bertsky approved these changes Mar 4, 2021

View reviewed changes

fix profiler logging test

0cfa612

ocrd process: memory_usage w/ multiprocess and include_children

7e78a5d

bertsky reviewed Sep 1, 2021

View reviewed changes

Merge branch 'master' into profile-mem

f005897

kba self-assigned this Oct 21, 2022

kba mentioned this pull request Oct 24, 2022

bashlib: implement new CLI options #929

Merged

bertsky mentioned this pull request Nov 22, 2022

Cache functionality #875

Merged

kba commented Nov 23, 2022

View reviewed changes

ocrd/ocrd/processor/helpers.py Outdated Show resolved Hide resolved

kba and others added 3 commits November 23, 2022 11:53

Update ocrd/ocrd/processor/helpers.py

342316e

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>

mem_usage: run process only once

a103ac9

Implement OCRD_PROFILE{_FILE} environemnt variable to configure CPU/m…

4fdfbf3

…emory profiling

kba requested a review from bertsky November 23, 2022 11:47

describe OCRD_PROFILE{,_FILE} in README

97a14b0

kba requested a review from MehmedGIT November 23, 2022 11:55

bertsky approved these changes Nov 23, 2022

View reviewed changes

kba merged commit 1b5a362 into master Nov 23, 2022

kba deleted the profile-mem branch November 17, 2023 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile mem #678

Profile mem #678

kba commented Mar 4, 2021

bertsky left a comment

bertsky Mar 4, 2021

kba Mar 4, 2021

bertsky Mar 4, 2021 •

edited

Loading

bertsky Oct 12, 2022

bertsky commented Mar 4, 2021

kba commented Mar 4, 2021

bertsky commented Mar 4, 2021

bertsky Sep 1, 2021

bertsky Oct 12, 2022

kba Oct 21, 2022

bertsky Oct 21, 2022

kba Oct 21, 2022

bertsky Oct 21, 2022

kba Nov 23, 2022 •

edited

Loading

kba Nov 23, 2022

MehmedGIT Nov 23, 2022

bertsky Nov 23, 2022

bertsky commented Sep 1, 2021

bertsky commented Oct 12, 2022

bertsky commented Oct 12, 2022

kba commented Nov 22, 2022

bertsky Nov 23, 2022

-    mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True)
+    mem_usage = memory_usage(proc=processor.process,
+                            interval=.1, timeout=None, timestamps=True,
+                            # include sub-processes
+                            multiprocess=True, include_children=True,
+                            # get proportional set size instead of RSS
+                            backend='psutil_pss')



	Logging is configured via configuration files, see [module logging](./ocrd_utils#ocr-d-module-logging).

Profile mem #678

Profile mem #678

Conversation

kba commented Mar 4, 2021

bertsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bertsky Mar 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bertsky commented Mar 4, 2021

kba commented Mar 4, 2021

bertsky commented Mar 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kba Nov 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bertsky commented Sep 1, 2021

bertsky commented Oct 12, 2022

bertsky commented Oct 12, 2022

kba commented Nov 22, 2022

Choose a reason for hiding this comment

bertsky Mar 4, 2021 •

edited

Loading

kba Nov 23, 2022 •

edited

Loading