Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the network client #1269

Merged
merged 35 commits into from
Aug 23, 2024
Merged

Extend the network client #1269

merged 35 commits into from
Aug 23, 2024

Conversation

MehmedGIT
Copy link
Contributor

@MehmedGIT MehmedGIT commented Aug 6, 2024

A brief beginning of addressing the points in #1265:

  • Fix network-integration-test failing due to a bad env variable
  • Ocrd network client:
    • can submit workflow jobs via the CLI
    • can check processing and workflow job statuses
    • can check the list of deployed processing workers/processor servers
    • can check the ocrd tool json of a deployed processor
    • can check the log file of a processing job
  • Two new env variables to control ocrd network client behavior: OCRD_NETWORK_CLIENT_POLLING_SLEEP and OCRD_NETWORK_CLIENT_POLLING_TIME
  • Integration tests for the ocrd network client
  • Reuse client logic in the processing server integration tests to avoid code duplication

@MehmedGIT
Copy link
Contributor Author

Connections to the client callback URL would be more complicated than in the example @joschrew shared. For the CI/CD pipeline, we need an appropriate host defining and port mapping just like in the other services in the docker-compose file.

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far so good, appreciate also the test.

We should really find a different mechanism than a global variable. I have proposed a way to at least scope such a variable to the server instance. Why do we need that mechanism anyway? IIUC callback_server.handle_request will handle exactly one request with no timeout, which is exactly what we want, is it not?

But I'm also wondering whether having an alternative based on polling for blocking processing might still be worth it. Because I imagine that a lot of firewalls won't allow the callback to a temporary server run on the user's machine. Polling is less efficient obviously but it does not require juggling IP addresses, ports and servers.

Makefile Show resolved Hide resolved
src/ocrd_network/cli/client.py Outdated Show resolved Hide resolved
src/ocrd_network/cli/client.py Outdated Show resolved Hide resolved
src/ocrd_network/cli/client.py Outdated Show resolved Hide resolved
src/ocrd_network/cli/client.py Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
src/ocrd_network/client.py Show resolved Hide resolved
src/ocrd_network/client_utils.py Outdated Show resolved Hide resolved
src/ocrd_utils/config.py Outdated Show resolved Hide resolved
@MehmedGIT MehmedGIT marked this pull request as ready for review August 13, 2024 13:33
@MehmedGIT MehmedGIT requested a review from bertsky August 13, 2024 13:49
@joschrew
Copy link
Contributor

I started to review, or rather to test the client but I couldn't finish so far. I will continue tomorrow. Here is what is not working for me up until now:

  • OCRD_NETWORK_SERVER_ADDR_PROCESSING in not working as a default for --address. I think simply default is missing as argument for the click-decorator
  • ocrd network client processing processor ocrd-cis-ocropy-binarize --address "http://localhost:8000" -I DEFAULT -O BIN-2 -m /data/vd18test/mets.xml gives an error: TypeError: send_processing_job_request() got an unexpected keyword argument 'block'. The parameter block_till_job_end could be renamed to block to resolve this.
  • this is not related to this pr (but I came across this while testing) but in ocrd_network/server_utils.py in line 215 and 218 it must be HTTP_400_BAD_REQUEST instead of HTTP_404_BAD_REQUEST (which does not exist).
  • running a processing job (send_processing_job_request) with the client does not work for me currently. The problem seems to be parameters in the inputs, at least it works if I remove this parameter from job_input. While debugging I think I realized a list instead of a dict is provided. Here I had to stop.

@MehmedGIT
Copy link
Contributor Author

MehmedGIT commented Aug 13, 2024

OCRD_NETWORK_SERVER_ADDR_PROCESSING in not working as a default for --address. I think simply default is missing as argument for the click-decorator

I'm not a big fan of copy-pasting the same default everywhere. Instead, I resolved it with 4de1e83. The --address default could probably be also a constant - the same default we use for the default processing server address.

The parameter block_till_job_end could be renamed to block to resolve this.

Resolved 8e7ba26.

this is not related to this pr (but I came across this while testing) but in ocrd_network/server_utils.py in line 215 and 218 it must be HTTP_400_BAD_REQUEST instead of HTTP_404_BAD_REQUEST (which does not exist).

You have caught something bigger than just that simple error. Resolved with 69808b6 and d1af85b. The HTTP exception was being caught with the general Exception catch which was duplicating the error output, and thus obfuscating it.

running a processing job (send_processing_job_request) with the client does not work for me currently. The problem seems to be parameters in the inputs, at least it works if I remove this parameter from job_input. While debugging I think I realized a list instead of a dict is provided. Here I had to stop.

That was not very pleasant to debug. The ocrd.decorators.parameter_option is returning ['{}'] and not {} by default. Which led to the parameters={'{': '}'} instead of parameters={}.

21:57:40.222 WARNING ocrd_network.processing_server - Job input: processor_name='ocrd-cis-ocropy-binarize' path_to_mets='/home/mm/repos/ocrd_network_tests/ws29/data/mets.xml' workspace_id=None description='OCR-D Network client request' input_file_grps=['DEFAULT'] output_file_grps=['BIN-TEST'] page_id=None parameters={'{': '}'} result_queue_name=None callback_url=None agent_type=<AgentType.PROCESSING_WORKER: 'worker'> job_id='d1e075e3-6cb0-40c7-b854-d2d77322cfbf' depends_on=None
21:57:40.222 ERROR ocrd_network.processing_server - Failed to validate processing job input against the tool json of processor: ocrd-cis-ocropy-binarize
["[] Additional properties are not allowed ('{' was unexpected)"]

Dirty fix: 50f73c5. Not sure how to better handle that.

Also, the -P and -p flags are currently broken with the client. Probably the parsing is incorrectly done. fixed with #1270

Copy link
Contributor

@joschrew joschrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the recent changes the client is working for me.

src/ocrd_network/client.py Outdated Show resolved Hide resolved
@@ -210,12 +212,12 @@ def validate_job_input(logger: Logger, processor_name: str, ocrd_tool: dict, job
raise_http_exception(logger, status.HTTP_404_NOT_FOUND, message)
try:
report = ParameterValidator(ocrd_tool).validate(dict(job_input.parameters))
if not report.is_valid:
message = f"Failed to validate processing job input against the tool json of processor: {processor_name}\n"
raise_http_exception(logger, status.HTTP_400_BAD_REQUEST, message + report.errors)
except Exception as error:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this exception be raised? Validators should not raise errors but return a report. If you don't have a specific use case, it might be better to just not do try/except so that potential errors are actually raised and fixed.

Copy link
Contributor Author

@MehmedGIT MehmedGIT Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to be extra secure against unexpected errors. The workers or the server (network agents as a service) should ideally never crash due to some error. Especially with an HTTP 500 error on the client side and leaving the network agent in an unpredictable state. The exceptions from problematic requests are still logged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, but errors in

report = ParameterValidator(ocrd_tool).validate(dict(job_input.parameters))

would mean that something is broken in our implementation or in jsonschema. So this would likely not be a fluke that happens once, we would need to stop processing and fix the bug at the source. The only variable factor is the job_input.parameters dict and if user-provided input breaks the validator - which it must absolutely never do - we should fix that in the validator.

Copy link
Contributor Author

@MehmedGIT MehmedGIT Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only variable factor is the job_input.parameters dict and if user-provided input breaks the validator - which it must absolutely never do - we should fix that in the validator.

A failing validator would also probably fail most of the processing/workflow jobs. Input from the user breaking down the entire infrastructure is conceptually wrong to me. Even if all the processing jobs fail, the Processing Server should still respond to other requests such as checking logs of old jobs. We would rather need a mechanism to prevent further submission of processing jobs in case X amount of jobs fail in a row - potentially preventing further requests only to the processing endpoint till it is fixed.

There is also currently no graceful shutdown for the processing server. I.e., once the server dies, anything inside the internal processing cache of the server (not the RabbitMQ) will be lost.

@kba
Copy link
Member

kba commented Aug 14, 2024

running a processing job (send_processing_job_request) with the client does not work for me currently. The problem seems to be parameters in the inputs, at least it works if I remove this parameter from job_input. While debugging I think I realized a list instead of a dict is provided. Here I had to stop.

That was not very pleasant to debug. The ocrd.decorators.parameter_option is returning ['{}'] and not {} by default. Which led to the parameters={'{': '}'} instead of parameters={}.

21:57:40.222 WARNING ocrd_network.processing_server - Job input: processor_name='ocrd-cis-ocropy-binarize' path_to_mets='/home/mm/repos/ocrd_network_tests/ws29/data/mets.xml' workspace_id=None description='OCR-D Network client request' input_file_grps=['DEFAULT'] output_file_grps=['BIN-TEST'] page_id=None parameters={'{': '}'} result_queue_name=None callback_url=None agent_type=<AgentType.PROCESSING_WORKER: 'worker'> job_id='d1e075e3-6cb0-40c7-b854-d2d77322cfbf' depends_on=None
21:57:40.222 ERROR ocrd_network.processing_server - Failed to validate processing job input against the tool json of processor: ocrd-cis-ocropy-binarize
["[] Additional properties are not allowed ('{' was unexpected)"]

The actual parsing of the -p and -P arguments was missing. Fixed in #1270.

kba and others added 2 commits August 14, 2024 16:00
ocrd network client: parse parameters and overrides
src/ocrd_network/cli/client.py Outdated Show resolved Hide resolved
@kba
Copy link
Member

kba commented Aug 22, 2024

@MehmedGIT I've created the changelog for this PR, anything essential missing?

@MehmedGIT
Copy link
Contributor Author

IMO, good to be merged.

@kba kba merged commit 6608539 into master Aug 23, 2024
22 checks passed
@kba kba deleted the extend-network-client branch August 23, 2024 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants