Workflow server #652

bertsky · 2020-12-01T00:38:37Z

Implementation of the workflow server.

ocrd --help

Commands:
  bashlib    Work with bash library
  log        Logging
  ocrd-tool  Work with ocrd-tool.json JSON_FILE
  validate   All the validation in one CLI
  process    Run processor CLIs in a series of tasks
  workflow   Process a series of tasks
  workspace  Working with workspace
  zip        Bag/Spill/Validate OCRD-ZIP bags

ocrd workflow --help

Usage: ocrd workflow [OPTIONS] COMMAND [ARGS]...

  Process a series of tasks

Options:
  --help  Show this message and exit.

Commands:
  client   Have the workflow server run commands
  process  Run processor CLIs in a series of tasks (alias for ``ocrd process``)
  server   Start server for a series of tasks to run processor CLIs or APIs...

ocrd workflow server --help

Usage: ocrd workflow server [OPTIONS] TASKS...

  Start server for a series of tasks to run processor CLIs or APIs on
  workspaces

  Parse the given tasks and try to instantiate all Pythonic processors among
  them with the given parameters. Open a web server that listens on the
  given host and port for GET requests named ``process`` with the following
  (URL-encoded) arguments:

      mets (string): Path name (relative to the server's CWD,
      or absolute) of the workspace to process

      page_id (string): Comma-separated list of page IDs to process

      overwrite (bool): Remove output pages/images if they already exist

  The server will handle each request by running the tasks on the given
  workspace. Pythonic processors will be run via API (on those same
  instances).  Non-Pythonic processors (or those not directly accessible in
  the current venv) will be run via CLI normally, instantiating each time.
  Also, between each contiguous chain of Pythonic tasks in the overall
  series, no METS de/serialization will be performed.

  Stop the server by sending SIGINT (e.g. via ctrl+c on the terminal), or
  sending a GET request named ``shutdown``.

Options:
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -h, --host TEXT                 host name/IP to listen at
  -p, --port INTEGER RANGE        TCP port to listen at
  --help                          Show this message and exit.

ocrd workflow client --help

Usage: ocrd workflow client [OPTIONS] COMMAND [ARGS]...

  Have the workflow server run commands

Options:
  -h, --host TEXT           host name/IP to listen at
  -p, --port INTEGER RANGE  TCP port to listen at
  --help                    Show this message and exit.

Commands:
  list-tasks  Have the workflow server print the configured task sequence
  process     Have the workflow server process another workspace
  shutdown    Have the workflow server shutdown gracefully

ocrd workflow client process --help

Usage: ocrd workflow client process [OPTIONS]

  Have the workflow server process another workspace

Options:
  -m, --mets TEXT     METS to process
  -g, --page-id TEXT  ID(s) of the pages to process
  --overwrite         Remove output pages/images if they already exist
  --help              Show this message and exit.

Example:

ocrd workflow server -p 5000 'olena-binarize -P impl sauvola-ms-split -I OCR-D-IMG -O OCR-D-IMG-BINPAGE-sauvola' 'anybaseocr-crop -I OCR-D-IMG-BINPAGE-sauvola -O OCR-D-IMG-BINPAGE-sauvola-CROP' 'cis-ocropy-denoise -P noise_maxsize 3.0 -P level-of-operation page -I OCR-D-IMG-BINPAGE-sauvola-CROP -O OCR-D-IMG-BINPAGE-sauvola-CROP-DEN' 'tesserocr-deskew -P operation_level page -I OCR-D-IMG-BINPAGE-sauvola-CROP-DEN -O OCR-D-IMG-BINPAGE-sauvola-CROP-DEN-DESK-tesseract' 'cis-ocropy-deskew -P level-of-operation page -I OCR-D-IMG-BINPAGE-sauvola-CROP-DEN-DESK-tesseract -O OCR-D-IMG-BINPAGE-sauvola-CROP-DEN-DESK-tesseract-DESK-ocropy' 'tesserocr-recognize -P segmentation_level region -P model deu -I OCR-D-IMG-BINPAGE-sauvola-CROP-DEN-DESK-tesseract-DESK-ocropy -O OCR-D-IMG-BINPAGE-sauvola-CROP-DEN-DESK-tesseract-DESK-ocropy-OCR-tesseract-deu'

curl -G -d mets=../kant_aufklaerung_1784/data/mets.xml -d overwrite=True -d page_id=PHYS_0017,PHYS_0020 http://127.0.0.1:5000/process
curl -G -d mets=../blumenbach_anatomie_1805/mets.xml http://127.0.0.1:5000/process
curl -G http://127.0.0.1:5000/shutdown
# equivalently:
ocrd workflow client -p 5000 process -m ../kant_aufklaerung_1784/data/mets.xml --overwrite -g PHYS_0017,PHYS_0020
ocrd workflow client process -m ../blumenbach_anatomie_1805/mets.xml
ocrd workflow client shutdown

Please note that (as already explained here) this only inreases efficiency for Python processors in the current venv, so bashlib processors or sub-venv processors will still be run via CLI. So you might want to group them differently in ocrd_all, or cascade workflows across sub-venvs.

(EDITED to reflect ocrd workflow client addition)

- add workflow CLI group: - add alias `ocrd workflow process` to `ocrd process` - add new `ocrd workflow server`, running a web server for the given workflow that tries to instantiate all Pythonic processors once (to re-use their API instead of starting CLI each time) - add `run_api` analogue to existing `run_cli` and let `run_processor` delegate to it in `ocrd.processor.helpers`: - `run_processor` only has workspace de/serialization and processor instantiation - `run_api` has core `process()`, but now also enters and leaves the workspace directory, and passes any exceptions - ocrd.task_sequence: differentiate between `parse_tasks` (independent of workspace or fileGrps) and `run_tasks`, generalize `run_tasks` to use either `run_cli` or new `run_api` (where instances are available, avoiding unnecessary METS de/serialisation) - amend `TaskSequence` by `instance` attribute and `instantiate` method: - peek into a CLI to check for Pythonic processors - try to compile and exec, using monkey-patching to disable normal argument passing, execution, and exiting; merely importing and fetching the class of the processor - instantiate processor without workspace or fileGrps - avoid unnecessary CLI call to get ocrd-tool.json

bertsky · 2020-12-01T01:07:48Z

And here's a recipe for doing workspace parallelization with the new workflow server and GNU parallel:

N=4
for ((i=0;i<N;i++)); do
    ocrd workflow server -p $((5000+i)) \
        "cis-ocropy-binarize -I OCR-D-IMG -O OCR-D-BIN" \
        "anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP" \
        "skimage-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P method li" \
        "skimage-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page" \
        "tesserocr-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P operation_level page" \
        "tesserocr-segment-region -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG-REG" \
        "segment-repair -I OCR-D-SEG-REG -O OCR-D-SEG-REPAIR -P plausibilize true" \
        "tesserocr-deskew -I OCR-D-SEG-REPAIR -O OCR-D-SEG-REG-DESKEW" \
        "cis-ocropy-clip -I OCR-D-SEG-REG-DESKEW -O OCR-D-SEG-REG-DESKEW-CLIP" \
        "tesserocr-segment-line -I OCR-D-SEG-REG-DESKEW-CLIP -O OCR-D-SEG-LINE" \
        "cis-ocropy-clip -I OCR-D-SEG-LINE -O OCR-D-SEG-LINE-CLIP -P level-of-operation line" \
        "cis-ocropy-dewarp -I OCR-D-SEG-LINE-CLIP -O OCR-D-SEG-LINE-RESEG-DEWARP" \
        "tesserocr-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P textequiv_level glyph -P overwrite_words true -P model GT4HistOCR_50000000}"
done
find . -name mets.xml | parallel -j$N curl -G -d mets={} http://127.0.0.1:\$((5000+{%}))/process
# equivalently:
find . -name mets.xml | parallel -j$N ocrd workflow client -p \$((5000+{%})) process -m {}

(EDITED to reflect ocrd workflow client addition)

codecov-io · 2020-12-01T01:58:18Z

Codecov Report

Merging #652 (6d15084) into master (135acb6) will decrease coverage by 7.76%.
The diff coverage is 35.29%.

@@            Coverage Diff             @@
##           master     #652      +/-   ##
==========================================
- Coverage   84.34%   76.58%   -7.77%     
==========================================
  Files          52       56       +4     
  Lines        3047     3587     +540     
  Branches      608      723     +115     
==========================================
+ Hits         2570     2747     +177     
- Misses        349      694     +345     
- Partials      128      146      +18

Impacted Files	Coverage Δ
ocrd/ocrd/cli/workflow.py	`27.27% <27.27%> (ø)`
ocrd/ocrd/processor/helpers.py	`70.40% <56.00%> (-10.37%)`	⬇️
ocrd/ocrd/cli/__init__.py	`100.00% <100.00%> (ø)`
ocrd/ocrd/cli/process.py	`93.33% <100.00%> (+0.47%)`	⬆️
ocrd/ocrd/processor/base.py	`64.92% <100.00%> (-15.49%)`	⬇️
ocrd_utils/ocrd_utils/os.py	`70.88% <0.00%> (-24.86%)`	⬇️
ocrd_utils/ocrd_utils/constants.py	`90.47% <0.00%> (-9.53%)`	⬇️
ocrd/ocrd/cli/workspace.py	`74.74% <0.00%> (-1.91%)`	⬇️
ocrd/ocrd/__init__.py	`100.00% <0.00%> (ø)`
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 135acb6...6d15084. Read the comment docs.

bertsky · 2020-12-01T08:08:18Z

Please note that (as already explained here) this only inreases efficiency for Python processors in the current venv, so bashlib processors or sub-venv processors will still be run via CLI. So you might want to group them differently in ocrd_all, or cascade workflows across sub-venvs.

So here are examples for both options:

grouping sub-venvs in ocrd_all to include ocrd_calamari with all other needed modules in one venv

  cd ocrd_all
  . venv/local/sub-venv/headless-tf1/bin/activate
  make MAKELEVEL=1 OCRD_MODULES="core ocrd_cis ocrd_anybaseocr ocrd_wrap ocrd_tesserocr tesserocr tesseract ocrd_calamari" all
  ocrd workflow server \
      "cis-ocropy-binarize -I OCR-D-IMG -O OCR-D-BIN" \
      "anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP" \
      "skimage-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P method li" \
      "skimage-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page" \
      "tesserocr-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P operation_level page" \
      "cis-ocropy-segment -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG-REG -P level-of-operation page" \
      "tesserocr-deskew -I OCR-D-SEG-REG -O OCR-D-SEG-REG-DESKEW" \
      "cis-ocropy-dewarp -I OCR-D-SEG-REG-DESKEW -O OCR-D-SEG-LINE-RESEG-DEWARP" \
      "calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint /path/to/models/\*.ckpt.json"

cascade workflows across sub-venvs (combined with workspace parallelization):

  N=4
  for ((i=0;i<N;i++)); do
      . venv/bin/activate
      ocrd workflow server -p $((5000+i)) \
          "cis-ocropy-binarize -I OCR-D-IMG -O OCR-D-BIN"
      . venv/local/sub-venv/headless-tf1/bin/activate
      ocrd workflow server -p $((6000+i)) \
          "anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP"
      . venv/bin/activate
      ocrd workflow server -p $((7000+i)) \
          "skimage-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P method li" \
          "skimage-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page" \
          "tesserocr-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P operation_level page" \
          "cis-ocropy-segment -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG-REG -P level-of-operation page" \
          "tesserocr-deskew -I OCR-D-SEG-REG -O OCR-D-SEG-REG-DESKEW" \
          "cis-ocropy-dewarp -I OCR-D-SEG-REG-DESKEW -O OCR-D-SEG-LINE-RESEG-DEWARP"
      . venv/local/sub-venv/headless-tf1/bin/activate
      ocrd workflow server -p $((8000+i)) \
          "calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint /path/to/models/\*.ckpt.json"
      . venv/bin/activate
  done
  find . -name mets.xml | parallel -j$N \
      ocrd workflow client -p \$((5000+{%})) process -m {} "&&" \
      ocrd workflow client -p \$((6000+{%})) process -m {} "&&" \
      ocrd workflow client -p \$((7000+{%})) process -m {} "&&" \
      ocrd workflow client -p \$((8000+{%})) process -m {}

…lementations currently expect them in the constructor)

bertsky · 2021-01-25T22:43:02Z

b4a8bcb: Gosh, I managed to forget that one essential, distinctive change: triggering the actual instantiation!

(Must have slipped through somewhere on the way from the proof-of-concept standalone to this integrated formulation.)

bertsky · 2021-01-26T13:22:30Z

I wonder if Flask (especially its server component, branded as development and has a debug option) is efficient enough. IIUC we actually need multi-threading in the server if we also want to spawn multiple processes for the worker (i.e. workspace parallelism). So maybe we should use a gevent or uwsgi server instead. (Of course, memory or GPU resources would need to be allocated to them carefully...)

But a multi-threaded server would require explicit session management in GPU-enabled processors (so startup and processing are in the same context). The latter is necessary anyway, though: Otherwise, when a workflow has multiple Tensorflow processors, they would steal each other's graphs.

Another thing that still bothers me is failover capacity. The workflow server should restart if any of its instances crash.

bertsky · 2021-02-11T16:56:41Z

Another edge for performance might be transferring the workspace to fast storage during processing, like a RAM disk. So whenever a /process request arrives, clone the workspace to /tmp, run there, and then write back (overwriting). In Dockerization (simply defining a service via make server PORT=N and EXPOSE N), you would probably run with --mount type=tmpfs,destination=/tmp then.

On the other hand, that machinery could as well be handled outside the workflow server – so the workspace would already be on fast storage. But for the Docker option, that would be more complicated (data shared with the outside would still need to be slow, so a second service would need to take care of moving workspaces to and fro).

codecov-commenter · 2021-05-13T12:01:32Z

Codecov Report

Merging #652 (cac80d6) into master (db79ff6) will decrease coverage by 3.35%.
The diff coverage is 25.68%.

❗ Current head cac80d6 differs from pull request most recent head d98daa8. Consider uploading reports for the commit d98daa8 to get more accurate results

@@            Coverage Diff             @@
##           master     #652      +/-   ##
==========================================
- Coverage   79.95%   76.60%   -3.36%     
==========================================
  Files          56       58       +2     
  Lines        3488     3697     +209     
  Branches      706      734      +28     
==========================================
+ Hits         2789     2832      +43     
- Misses        565      723     +158     
- Partials      134      142       +8

Impacted Files	Coverage Δ
ocrd/ocrd/cli/server.py	`0.00% <0.00%> (ø)`
ocrd/ocrd/cli/workflow.py	`39.36% <39.36%> (ø)`
ocrd/ocrd/processor/helpers.py	`70.40% <48.14%> (-10.37%)`	⬇️
ocrd/ocrd/cli/__init__.py	`100.00% <100.00%> (ø)`
ocrd/ocrd/cli/process.py	`93.33% <100.00%> (+0.47%)`	⬆️
ocrd/ocrd/processor/base.py	`67.16% <100.00%> (+0.24%)`	⬆️
ocrd/ocrd/resolver.py	`94.50% <0.00%> (-2.20%)`	⬇️
ocrd/ocrd/workspace.py	`71.05% <0.00%> (-0.52%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecdb840...d98daa8. Read the comment docs.

bertsky · 2021-06-09T23:15:31Z

@kba, note ccb369a is necessary for core in general since resmgr wants to be able to identify the startup CWD from the workspace. The other two recent commits add proper handling and passing of errors over the network.

Another thing that still bothers me is failover capacity. The workflow server should restart if any of its instances crash.

That's already covered under normal circumstances: run_api catches everything, logs it, and returns the exception. run_tasks gets that, and re-raises. cli.workflow.process catches everything, logs it, and sends the appropriate response. So the processor instances are still alive. Actual crashes like segfaults or pipe signals would drag down the server as well. So failover would be necessary above that anyway – for example in Docker.

But a multi-threaded server would require explicit session management in GPU-enabled processors (so startup and processing are in the same context). The latter is necessary anyway, though: Otherwise, when a workflow has multiple Tensorflow processors, they would steal each other's graphs.

I just confirmed this by testing: If I start a workflow with multiple TF processors, the last one will steal the others' session. You have to explicitly create sessions, store them into the instance, and reset prior to processing. (This means changes to all existing TF processors we have. See here and here for examples.)

I wonder if Flask (especially its server component, branded as development and has a debug option) is efficient enough. IIUC we actually need multi-threading in the server if we also want to spawn multiple processes for the worker (i.e. workspace parallelism). So maybe we should use a gevent or uwsgi server instead. (Of course, memory or GPU resources would need to be allocated to them carefully...)

Since Python's GIL prevents actual thread-level parallelism on shared resources (like processor instances), we'd have to do multi-processing anyway. I think I'll incorporate uwsgi (which does preforking) to achieve workspace parallelism. The server will have to do parse_tasks and instantiate in a dedicated function with Flask's decorator @app.before_first_request IIUC.

- replace Flask dev server with external uwsgi call - factor out Flask app code into separate Python module which uWSGI can pick up - make uWSGI run given number of workers via multi-processing but not multi-threading, and prefork before loading app (to protect GPU and non-thread-safe processors, and because of GIL) - pass tasks and other settings via CLI options (wrapped in JSON) - set worker Harakiri (reload after timeout) based on number of pages multiplied by given page timeout - add option for number of processes and page timeout

bertsky · 2021-06-10T23:32:15Z

I think I'll incorporate uwsgi (which does preforking) to achieve workspace parallelism. The server will have to do parse_tasks and instantiate in a dedicated function with Flask's decorator @app.before_first_request IIUC.

Done! (Works as expected, even with GPU-enabled processors sharing memory via growth.)

bertsky · 2021-06-11T08:41:39Z

Updated help text for ocrd workflow server now reads:

Usage: ocrd workflow server [OPTIONS] TASKS...

  Start server for a series of tasks to run processor CLIs or APIs on
  workspaces

  Parse the given tasks and try to instantiate all Pythonic processors among
  them with the given parameters. Open a web server that listens on the
  given ``host`` and ``port`` and queues requests into ``processes`` worker
  processes for GET requests named ``/process`` with the following (URL-
  encoded) arguments:

      mets (string): Path name (relative to the server's CWD,
      or absolute) of the workspace to process

      page_id (string): Comma-separated list of page IDs to process

      log_level (int): Override all logger levels during processing

      overwrite (bool): Remove output pages/images if they already exist

  The server will handle each request by running the tasks on the given
  workspace. Pythonic processors will be run via API (on those same
  instances).  Non-Pythonic processors (or those not directly accessible in
  the current venv) will be run via CLI normally, instantiating each time.
  Also, between each contiguous chain of Pythonic tasks in the overall
  series, no METS de/serialization will be performed.

  If processing does not finish before ``timeout`` seconds per page, then
  the request will fail and the respective worker be reloaded.

  To see the server's workflow configuration, send a GET request named
  ``/list-tasks``.

  Stop the server by sending SIGINT (e.g. via ctrl+c on the terminal), or
  sending a GET request named ``/shutdown``.

Options:
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -t, --timeout INTEGER           maximum processing time (in sec per page)
                                  before reloading worker (0 to disable)

  -j, --processes INTEGER         number of parallel workers to spawn
  -h, --host TEXT                 host name/IP to listen at
  -p, --port INTEGER RANGE        TCP port to listen at
  --help                          Show this message and exit.

bertsky · 2021-06-11T12:18:26Z

I just confirmed this by testing: If I start a workflow with multiple TF processors, the last one will steal the others' session. You have to explicitly create sessions, store them into the instance, and reset prior to processing. (This means changes to all existing TF processors we have. See here and here for examples.)

Let me recapitulate the whole issue of sharing GPUs (physically, esp. its RAM) and sharing Tensorflow sessions (logically, when run via API in the same preloading workflow server):

Normal computing resources like CPU, RAM and disk are shared naturally by the OS' task and I/O scheduler, the CPU's MMU and the disk driver's scheduler. Oversubscription can quickly become inefficient, but is easily avoidable by harmonizing the number of parallel jobs with the number of physical cores and their RAM outfit. Still, it can be worth risking transient oversubscription in exchange for a higher average resource utilization.

For GPU however, it's not that simple: GPURAM is not normally paged (because that would make it slow) or even swapped, hence is an exclusive resource which may result in OOM errors, and when processes need to wait for shaders, the benefit of using GPU over CPU in the first place might vanish. Therefore, a runtime system / framework like OCR-D needs to take extra care of strictly preventing GPU oversubscription, as even transient oversubscription usually leads to OOM failures. However, it's rather hard to anticipate what GPU resources a certain workflow configuration will need (both on average and at peak).

In the OCR-D makefilization, I chose to deal with GPU-enabled workflow steps bluntly by marking them as such and synchronizing all GPU-enabled processor runs via a semaphore (discounting current runners against the number of physical GPUs). But exclusive GPU locks/semaphores stop working when the processors get preloaded into memory, and often multiple processors could actually share a single GPU – it depends on how they are programmed and what size the models (or even the input images!) are.

For Tensorflow, the normal mode of operation is to allocate all available GPURAM on startup and thus use it exclusively throughout the process' lifetime. But there are more options:

ConfigProto().gpu_options.allow_growth: allocates dynamically as needed (slower during the first run; "needed" can still mean more than strictly necessary; no de-allocation; can still yield OOM)
ConfigProto().gpu_options.per_process_gpu_memory_fraction: allocates no more than the given fraction of physical memory (much slower during the first run and slightly slower when competing for memory; can still yield OOM)
ConfigProto().gpu_options.experimental.use_unified_memory: enables paging of memory to CPU (much slower on the first run and slightly slower when competing for memory; no more OOM)

So it now depends on how clever the processors are programmed, how many of them we want to run (sequentially in the same workflow, or in parallel) and how large the models/images are. I believe OCR-D needs to come up with conventions for advertising the number of GPU consumers and for writing processors sharing resources cooperatively (without giving up too much performance). If we have enough runtime configuration facilities, then the user/admin can at least dimension and optimise by experiment.

Of course, additionally having processing servers for the individual processors would also help better control resource allocation: each such server could be statically configured for full utilization and then dynamically distribute the work load from all processors (across pages / workflows / workspaces) in parallel (over shaders/memory) and sequentially (over a queue). Plus preloading and isolation would not fall onto the workflow server's shoulders anymore.

One of the largest advantages of using the workflow server thus is probably not reduced overhead (when you have small workspaces/documents) or latency, but the ability to scale across machines: as a network service, it can easily deployed across a Docker swarm for example.

- add `--server` option to CLI decorator - implement via new `ocrd.server.ProcessingServer`: - based on gunicorn (for preforking directly from configured CLI in Python, but instantiating the processor after forking to avoid any shared GPU context) - using multiprocessing.Lock and Manager to lock (synchronize) workspaces among workers - using signal.alarm for worker timeout mechanics - using pre- and post-fork hooks for GPU- vs CPU- worker mechanics - doing Workspace validation within the request

bertsky · 2021-06-16T06:17:57Z

In 6263bb1 I have started implementing the processing server pattern. It is completely indepent of the workflow server for now, so the latter still needs to be adapted to make proper use of the former. This could help overcome some of the problems with the current workflow server approach:

not require the processors to be in the same venv (or even be Pythonic) anymore,
not require monkey-patching Python classes anymore,
not stick multiple tasks in the same process anymore (risking interference as in the Tensorflow case).

But the potential merits reach even further:

run different parts of the workflow with different parallelism (slower tasks could be assigned more cores; GPU tasks could be assigned the exact number of GPU resources available while others get to see more CPU cores)
orchestrating workflows incrementally and with signalling (progress meters) and timeouts
encapsulating processor servers as Docker services, docker-composing (or even docker-swarming) workflows
(which means that ocrd/all won't need to be a fat container anymore, and individual processor images can be based on the required CUDA runtime; currently we can only make exactly one CUDA/TF version work at the same time)

Some of the design decisions which will have to be made now:

set up processing servers in advance (externally/separately) or managed by workflow server
- if external: task descriptions should contain the --server parameters
- if managed: decide which ports to run them on, try setting them up, run ring back service and wait for startup notification, timeout and fall back to CLI, teardown managed PS PIDs/ports on WS termination
what to do with non-Pythonic processors
- integrate a simple but transparent / automatic socat-based server with means of the shell into bashlib
- ignore them and always run as CLI

…cular import

…rver

tdoan2010 · 2022-03-01T16:33:37Z

Hi @bertsky, after reading through the PR, this is my understanding about it. Please correct me if I'm wrong.

Understanding

Whenever we start a workflow server, we have to define a workflow for it. Since the workflow is defined at the instantiation time, one server can run one workflow only. A workflow here is a series of commands to trigger OCR processors with the respective input and output. So, the syntax is like this:

ocrd workflow server -p <port> <workflow>

Advantage

The benefit of this approach is that all necessary processors are already loaded into memory and ready to be used. So, if we want to run 1 workflow multiple times, this approach will save us time from loading and unloading models. But if we want to run another workflow, then we have to start another server with ocrd workflow server.

Disadvantage

Imagine we have a use case where users have workflow descriptions (in whatever language that we can parse) and want to execute them on our infrastructure. This ocrd workflow server is not an appropriate approach, since its advantage lies on the fact that the workflow is fixed (so that it can be loaded once and stay in the memory "forever"), 1 workflow - 1 server.

bertsky · 2022-03-02T10:48:39Z

@tdoan2010 your understanding of the workflow server is correct. This PR also implements a processing server, but you did not address that part.

To your interpretation:

Disadvantage

Imagine we have a use case where users have workflow descriptions (in whatever language that we can parse) and want to execute them on our infrastructure. This ocrd workflow server is not an appropriate approach, since its advantage lies on the fact that the workflow is fixed (so that it can be loaded once and stay in the memory "forever"), 1 workflow - 1 server.

The current implementation also parses workflow descriptions (in the syntax of ocrd process). There is no loss of generality here – future syntaxes can of course be integrated as well.

Further, I fail to see the logic of why this is inappropriate for dynamically defined workflows. You can always start up another workflow server at run time if necessary. In fact, such servers could be automatically managed (set up / tear down, swarming) by a more high-level API service. And you don't loose anything by starting up all processors once and then processing. You only gain speed (starting with the non-first workspace to run on).

As argued in the original discussion memo cited at the top, overhead due to initialization is significant in OCR-D, especially with GPU processors, and the only way around this is workflow servers and/or processing servers.

…rver

bertsky added 4 commits December 1, 2020 01:08

Processor.__init__: fix OCR-D#274

239cf3f

adapt test_task_sequence

1cb161c

workflow server: add response on shutdown

63be07d

add workflow client; add server end-point list-tasks

990857f

bertsky requested a review from kba December 1, 2020 07:35

run_processor: set fileGrps already during instantiation (as some imp…

f4e71a8

…lementations currently expect them in the constructor)

workflow CLI: proper exit codes and error logging

fddb236

bertsky mentioned this pull request Jan 25, 2021

Processor resource discovery #559

Merged

workflow server: trigger the actual instantiation

b4a8bcb

bertsky added 3 commits January 26, 2021 14:46

workflow server: run single-threaded

6d15084

Merge remote-tracking branch 'upstream/master' into workflow-server

6e2e7ff

fix conflicting cli command name

e34b70a

bertsky mentioned this pull request Mar 4, 2021

Profile mem #678

Merged

bertsky added 2 commits March 9, 2021 12:27

Merge remote-tracking branch 'upstream/master' into workflow-server

1dd2d54

Merge remote-tracking branch 'bertsky/fix-recrop' into workflow-server

e637a57

bertsky added 3 commits June 9, 2021 17:52

workflow server: send appropriate HTTP status codes in case of error

e3c992e

workflow client: show response text in case of server error

2949925

ocrd.processor.base: also init old_pwd when no workspace yet

ccb369a

bertsky mentioned this pull request Jun 9, 2021

Factor setup qurator-spk/sbb_binarization#31

Merged

bertsky added 3 commits June 11, 2021 18:33

workflow server: allow workers to opt out of CUDA via envvar

e6d61a3

workflow server: lock METS while processing

cac80d6

workflow server: do monkey patching below module level to prevent cir…

4b59396

…cular import

bertsky mentioned this pull request Jun 26, 2021

init from constructor not during process OCR-D/ocrd_calamari#65

Merged

Merge remote-tracking branch 'upstream/master' into workflow-server

fa1bc37

kba mentioned this pull request Jul 23, 2021

test #703

Closed

bertsky and others added 7 commits September 29, 2021 21:40

Merge remote-tracking branch 'upstream/master' into workflow-server

fcbcc82

workflow server: improve log msg

8193559

Move actuell process logic into seperate function.

5d48239

Add process_images endpoint.

6ff1d40

Merge remote-tracking branch 'upstream/master' into workflow-server

08658f9

Merge remote-tracking branch 'upstream/master' into workflow-server

417faf0

Merge branch 'master' of ssh://github.com/OCR-D/core into workflow-se…

83b10f5

…rver

kba mentioned this pull request Feb 14, 2022

Common understanding about the workflow requirements OCR-D/zenhub#48

Closed

4 tasks

tdoan2010 mentioned this pull request May 1, 2022

research OCR-D/zenhub#83

Closed

Merge branch 'master' of ssh://github.com/OCR-D/core into workflow-se…

d98daa8

…rver

This was referenced May 10, 2022

Workflow format OCR-D/zenhub#35

Open

Collect and implement feedback regarding the WebAPI OCR-D/zenhub#79

Closed

bertsky mentioned this pull request Apr 3, 2023

Standalone Processor Server (#884 + #974) #1030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow server #652

Workflow server #652

bertsky commented Dec 1, 2020 •

edited

Loading

bertsky commented Dec 1, 2020 •

edited

Loading

codecov-io commented Dec 1, 2020 •

edited

Loading

bertsky commented Dec 1, 2020 •

edited

Loading

bertsky commented Jan 25, 2021

bertsky commented Jan 26, 2021

bertsky commented Feb 11, 2021 •

edited

Loading

codecov-commenter commented May 13, 2021 •

edited

Loading

bertsky commented Jun 9, 2021 •

edited

Loading

bertsky commented Jun 10, 2021

bertsky commented Jun 11, 2021

bertsky commented Jun 11, 2021 •

edited

Loading

bertsky commented Jun 16, 2021 •

edited

Loading

tdoan2010 commented Mar 1, 2022

bertsky commented Mar 2, 2022

Disadvantage

Workflow server #652

Are you sure you want to change the base?

Workflow server #652

Conversation

bertsky commented Dec 1, 2020 • edited Loading

bertsky commented Dec 1, 2020 • edited Loading

codecov-io commented Dec 1, 2020 • edited Loading

Codecov Report

bertsky commented Dec 1, 2020 • edited Loading

bertsky commented Jan 25, 2021

bertsky commented Jan 26, 2021

bertsky commented Feb 11, 2021 • edited Loading

codecov-commenter commented May 13, 2021 • edited Loading

Codecov Report

bertsky commented Jun 9, 2021 • edited Loading

bertsky commented Jun 10, 2021

bertsky commented Jun 11, 2021

bertsky commented Jun 11, 2021 • edited Loading

bertsky commented Jun 16, 2021 • edited Loading

tdoan2010 commented Mar 1, 2022

Understanding

Advantage

Disadvantage

bertsky commented Mar 2, 2022

Disadvantage

bertsky commented Dec 1, 2020 •

edited

Loading

bertsky commented Dec 1, 2020 •

edited

Loading

codecov-io commented Dec 1, 2020 •

edited

Loading

bertsky commented Dec 1, 2020 •

edited

Loading

bertsky commented Feb 11, 2021 •

edited

Loading

codecov-commenter commented May 13, 2021 •

edited

Loading

bertsky commented Jun 9, 2021 •

edited

Loading

bertsky commented Jun 11, 2021 •

edited

Loading

bertsky commented Jun 16, 2021 •

edited

Loading