From 1a0614cfb921bddc27c61a176ecf541ab5c3955b Mon Sep 17 00:00:00 2001 From: Jeroen Dries Date: Fri, 18 Nov 2022 08:30:16 +0100 Subject: [PATCH 1/3] document terrascope job options job options need to be documented, as shown by forum questions: https://discuss.eodc.eu/t/error-occuring-when-calculating-s2-mosaic-for-some-periods-others-work-fine/492/3 --- federation/index.md | 52 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/federation/index.md b/federation/index.md index fc2d257d2..073ebfd97 100644 --- a/federation/index.md +++ b/federation/index.md @@ -176,13 +176,61 @@ on the appropriate processing back-ends. Subsequent interaction (starting the jobs, polling their status, requesting the result assets, ...) can be done through the "master" `job` object created above, in the same way as with normal batch jobs. -### Validity signer URLs (Batch job results) +### Validity signed URLs (Batch job results) Batch job results are accessible to the user via signed URLs stored in the result assets. Within the platform, these URLs have a validity (expiry time) of 7 days. Within these 7 days, the results of a batch job can be accessed by any person with the URL. Each time a user requests the results from the job endpoint (`GET /jobs/{job_id}/results`), a freshly signed URL (valid for 7 days) is created for the result assets. +### Customizing batch job resources on Terrascope + +Jobs running on the (Terrascope) cluster get assigned a default amount of cpu and memory resources. This +may not always be enough for your job, for instance when using UDF's. Also for very large jobs, you may want +to tune your resource settings to optimize for cost. + +The example below shows how to start a job with all options set to their default values. It is important to highlight +that default settings are subject to change by the backend whenever needed. + +```python +job_options = { + "executor-memory": "2G", + "executor-memoryOverhead": "3G", + "executor-cores": "2", + "task-cpus": "1", + "executor-request-cores": "400m", + "max-executors": "100", + "driver-memory": "8G", + "driver-memoryOverhead": "2G", + "driver-cores": "5" + } +cube.execute_batch( job_options=job_options) +``` + +This is a short overview of the various options: + +- executor-memory: memory assigned to your workers, for the JVM that executes most predefined processes +- executor-memoryOverhead: memory assigned on top of the JVM, for instance to run UDF's +- executor-cores: number of cpu's per worker (executor). The number of parallel tasks is executor-cores/task-cpus +- task-cpus: cpus assigned to a single task. UDF's using libraries like Tensorflow can benefit from further parallellization on the level of individual tasks. +- executor-request-cores: this settings is only relevant for Kubernetes based backends, allows to overcommit cpu +- max-executors: the maximum number of workers assigned to your job. Maximum number of parallel tasks is `max-executors*executor-cores/task-cpus`. Increasing this can inflate your costs, while not necessarily improving performance! +- driver-memory: memory assigned to the spark 'driver' JVM that controls execution of your batch job +- driver-memoryOverhead: memory assigned to the spark 'driver' on top of JVM memory, for Python processes. + +#### Learning more + +The topic of resource optimization is a complex one, and here we just give a short summary. The goal of openEO is to hide most of these +details from the user, but we realize that advanced users sometimes want to have a bit more insight, so in the spirit of being open, we give some hints. + +To learn more about these options, we point to the piece of code that handles this: + +https://github.com/Open-EO/openeo-geopyspark-driver/blob/master/openeogeotrellis/backend.py#L1213 + +Most memory related options are translated to Apache Spark configuration settings, which are documented here: + +https://spark.apache.org/docs/3.3.1/configuration.html#application-properties + ### Batch job results on Sentinel Hub -If you are processing data and the underlying back-end is Sentinel Hub, the output extent of your batch job results is currently larger than your input extent because Sentinel Hub processes whole tiles (this may change in the future and the data will be cropped to your input extent). \ No newline at end of file +If you are processing data and the underlying back-end is Sentinel Hub, the output extent of your batch job results is currently larger than your input extent because Sentinel Hub processes whole tiles (this may change in the future and the data will be cropped to your input extent). From ac49a7417ce4b537341e33fc5583a69dbe02420d Mon Sep 17 00:00:00 2001 From: Jeroen Dries Date: Mon, 21 Nov 2022 19:11:05 +0100 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Stefaan Lippens --- federation/index.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/federation/index.md b/federation/index.md index 073ebfd97..c7ad094c9 100644 --- a/federation/index.md +++ b/federation/index.md @@ -176,7 +176,7 @@ on the appropriate processing back-ends. Subsequent interaction (starting the jobs, polling their status, requesting the result assets, ...) can be done through the "master" `job` object created above, in the same way as with normal batch jobs. -### Validity signed URLs (Batch job results) +### Validity of signed URLs in batch job results Batch job results are accessible to the user via signed URLs stored in the result assets. Within the platform, these URLs have a validity (expiry time) of 7 days. Within these 7 days, the results of a batch job can be accessed @@ -185,7 +185,7 @@ a freshly signed URL (valid for 7 days) is created for the result assets. ### Customizing batch job resources on Terrascope -Jobs running on the (Terrascope) cluster get assigned a default amount of cpu and memory resources. This +Jobs running on the (Terrascope) cluster get assigned a default amount of CPU and memory resources. This may not always be enough for your job, for instance when using UDF's. Also for very large jobs, you may want to tune your resource settings to optimize for cost. @@ -196,24 +196,24 @@ that default settings are subject to change by the backend whenever needed. job_options = { "executor-memory": "2G", "executor-memoryOverhead": "3G", - "executor-cores": "2", - "task-cpus": "1", + "executor-cores": 2, + "task-cpus": 1, "executor-request-cores": "400m", "max-executors": "100", "driver-memory": "8G", "driver-memoryOverhead": "2G", - "driver-cores": "5" + "driver-cores": 5, } -cube.execute_batch( job_options=job_options) +cube.execute_batch(job_options=job_options) ``` This is a short overview of the various options: - executor-memory: memory assigned to your workers, for the JVM that executes most predefined processes - executor-memoryOverhead: memory assigned on top of the JVM, for instance to run UDF's -- executor-cores: number of cpu's per worker (executor). The number of parallel tasks is executor-cores/task-cpus -- task-cpus: cpus assigned to a single task. UDF's using libraries like Tensorflow can benefit from further parallellization on the level of individual tasks. -- executor-request-cores: this settings is only relevant for Kubernetes based backends, allows to overcommit cpu +- executor-cores: number of CPUs per worker (executor). The number of parallel tasks is executor-cores/task-cpus +- task-cpus: CPUs assigned to a single task. UDF's using libraries like Tensorflow can benefit from further parallellization on the level of individual tasks. +- executor-request-cores: this settings is only relevant for Kubernetes based backends, allows to overcommit CPU - max-executors: the maximum number of workers assigned to your job. Maximum number of parallel tasks is `max-executors*executor-cores/task-cpus`. Increasing this can inflate your costs, while not necessarily improving performance! - driver-memory: memory assigned to the spark 'driver' JVM that controls execution of your batch job - driver-memoryOverhead: memory assigned to the spark 'driver' on top of JVM memory, for Python processes. From fe3cb177ca697ca7c6a4b6f65df612cb9a0c6df8 Mon Sep 17 00:00:00 2001 From: Jeroen Dries Date: Mon, 21 Nov 2022 19:13:39 +0100 Subject: [PATCH 3/3] use permalink --- federation/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/federation/index.md b/federation/index.md index c7ad094c9..9fb56b6e4 100644 --- a/federation/index.md +++ b/federation/index.md @@ -225,7 +225,7 @@ details from the user, but we realize that advanced users sometimes want to have To learn more about these options, we point to the piece of code that handles this: -https://github.com/Open-EO/openeo-geopyspark-driver/blob/master/openeogeotrellis/backend.py#L1213 +https://github.com/Open-EO/openeo-geopyspark-driver/blob/faf5d5364a82e870e42efd2a8aee9742f305da9f/openeogeotrellis/backend.py#L1213 Most memory related options are translated to Apache Spark configuration settings, which are documented here: