Feature/new update submission status algorithm #832

toan-quach · 2023-11-27T07:12:21Z

Purpose:

This PR is an attempt to improve the submission status change algorithm performance.

Changes:

Instead of looping over all jobs of a submission entity when 1 job status is updated, we will update the submission status based only on the recently updated job status.

Remaing tasks:

Looking into adding check for submission status in test_orchestrator.py

Result

Previously, when running pytest tests/enterprise/core/test_end_to_end.py::test_without_authorization_standalone on my local (the previous test used during our discussion), it took 22.55s to run, after applying this changes, it now takes only 5.39s to run!

Results from running profiling:

------------ Old version ------------

------------ New version ------------

…instead of either job or job ids

github-actions · 2023-11-27T08:42:43Z

☂️ Python Cov

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
8717	8256	95%	85%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
src/taipy/core/_orchestrator/_orchestrator.py	98%	🟢
src/taipy/core/submission/_submission_converter.py	100%	🟢
src/taipy/core/submission/_submission_manager.py	100%	🟢
src/taipy/core/submission/_submission_model.py	100%	🟢
src/taipy/core/submission/submission.py	94%	🟢
TOTAL	98%	🟢

updated for commit: c3f2a21 by action🐍

gmarabout

There are actually two independent parts in this PR.

The first is to add the entity_type to the submission.
I fully approve this part ! 👏🏻

The second is to change how the status of the submission is updated. I have comments on how it's done: duplicated status information (the _submission_statusand the various boolean fields), the additional sets to store job IDs per status that looks unused... I added some comments on this.

src/taipy/core/submission/_submission_converter.py

src/taipy/core/submission/submission.py

gmarabout · 2023-11-27T10:34:24Z

src/taipy/core/submission/submission.py

+        self.__running_jobs: MutableSet[str] = set()
+        self.__blocked_jobs: MutableSet[str] = set()
+        self.__pending_jobs: MutableSet[str] = set()
+


❓ What is the purpose of these additional set fields? As far as I can see, we add/remove job ids from it in _update_submission_status but we never read from it!
if it is just to track the statuses of the jobs, why not just using one single Dict[JobId, Status]?

Oh it's just something to hold if we still have jobs running, blocked, or pending or not. As we still need access to other job statuses, if we use a dict, we will need to loop through all job statuses to do this check, while if we keep each status to an equivalent set, we can quickly identify the existence of "running jobs" or "blocked jobs", etc. in O(1) time. Does that make sense to you?

Ok, but is it used anywhere in the code? I can't find any...
If it is not used, why do we need those sets?

Sorry, but I have concerns about this implementation...
Let me elaborate, so you can tell me I shouldn't actually be worried :-)

By keeping such "states" in the object without any db persistence will make any Taipy-core service stateful. Example: a pod running in Kubernetes.

I see at least two problems now:

If we need to "scale up" the pod (because we need more throughput/horse power), then the state remains in on pod but the other has an inconsistent state. The behaviour might become erratic.

If the pod dies for some reason, then the state/data is lost forever.

So either this state is important, and thus we need to somehow persist it, either it's not useful and we should just get rid of it.

let me know if we need to discuss it with more details.

Hmmm for now, from what I can understand, as long as orchestrator and dispatcher is on the same application, sharing the same memory pool, this shouldn't be an issue. Previously we also rely on this fact when initiating the callback from job to update the submission status:

To run callbacks https://github.com/Avaiga/taipy-core/blob/develop/src/taipy/core/job/job.py#L31 this function relies on callbacks store in memory and not in disk/db

Assigning callbacks to jobs https://github.com/Avaiga/taipy-core/blob/develop/src/taipy/core/_orchestrator/_orchestrator.py#L81 we never store these callbacks all the way until after we have finished running the job in https://github.com/Avaiga/taipy-core/blob/develop/src/taipy/core/_orchestrator/_dispatcher/_job_dispatcher.py#L124 where we set everything and consequently also saving the callbacks of the job into the disk/db

But you made a very valid point, in the scenario where we want to scale up, separating the dispatcher to a separate application or even, a separate machine, this won't work anymore. I think we should discuss this point further :D and we should also involve @jrobinAV as I think this topic would be interesting for this ticket also https://app.zenhub.com/workspaces/taipy-613ef328affde6000ff77fb3/issues/gh/avaiga/taipy-enterprise/268

Isn't that the dispatcher and orchestrator should always on the same machine, and we only split the jobs?

@trgiangdo In theory, if you consider Kubernetes as the deployment platform, you shouldn't make such assumptions. Services running on K8s must natively be stateless and restartable without breaking the app (the so-called "cloud native").
And if it can only run once (i.e. a single process that won't scale), then it must also be ready to be restarted on other nodes without notice (K8S does that). It means the state must be easily restored or recomputed. In this case, it would just be emptied...
I don't want to block the PR, but design choices imply things that go beyond the scope of the code.

I believe we can close this topic for now.
To summarize:

We acknowledge that there are several places that are stateful that we don't have an approach to serialize them across multiple machines yet (including job callbacks, state of the submission entity, the event notification, etc.)

We will not these issues right now and will review them again later

Yes, we can close the topic.

src/taipy/core/submission/submission.py

src/taipy/core/_orchestrator/_orchestrator.py

src/taipy/core/submission/submission.py

tests/core/submission/test_submission.py

jrobinAV · 2023-11-28T10:38:27Z

src/taipy/core/submission/submission.py

-            if job.is_abandoned():
-                abandoned = True
-        if canceled:
+    def _update_submission_status(self, job: Job):


I am concern about the the possibility to call this method for the same job for 2 (or more) status changes in the wrong order.

as discussed, the status of a job will be finished updated by the orchestrator before handing it over to the dispatcher so no update to job status shall create a collision. The only scenario this might happen is when we try to cancel a job as this action will be trigger directly from the main thread while the job could be running in the subthread. But we have various conditions to check if we can cancel a job or not, if it's running then we can't, if it's not running, I think this process can still be triggered safely when we update the status of job to canceled

tests/core/submission/test_submission.py

jrobinAV · 2023-12-04T12:43:30Z

The code is already merged in taipy repo, right?
Can we close this PR?

toan-quach · 2023-12-04T13:15:36Z

@jrobinAV yep we can! Sorry for that!

Toan Quach added 4 commits November 23, 2023 21:04

added entity type to submission , store job_ids in submission entity …

43bbf1b

…instead of either job or job ids

poc for update submission status by sinlg job

c853d1b

poc for new update submission status algo by sinlg job

5ee54dd

reduce job status call

e6c4fcb

fixed linter complains

e90c436

toan-quach requested review from gmarabout, jrobinAV and trgiangdo November 27, 2023 09:29

gmarabout reviewed Nov 27, 2023

View reviewed changes

remove __failed and __canceled

a08bfa4

toan-quach requested a review from gmarabout November 27, 2023 13:36

added submission status tests on test_orchestrator.py

c3f2a21

trgiangdo reviewed Nov 28, 2023

View reviewed changes

src/taipy/core/submission/submission.py Show resolved Hide resolved

src/taipy/core/submission/submission.py Show resolved Hide resolved

src/taipy/core/_orchestrator/_orchestrator.py Show resolved Hide resolved

src/taipy/core/_orchestrator/_orchestrator.py Show resolved Hide resolved

jrobinAV reviewed Nov 28, 2023

View reviewed changes

tests/core/submission/test_submission.py Show resolved Hide resolved

toan-quach closed this Dec 4, 2023

toan-quach deleted the feature/new-update-submission-status-algorithm branch December 4, 2023 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/new update submission status algorithm #832

Feature/new update submission status algorithm #832

toan-quach commented Nov 27, 2023 •

edited

Loading

github-actions bot commented Nov 27, 2023 •

edited

Loading

gmarabout left a comment

gmarabout Nov 27, 2023

toan-quach Nov 27, 2023

gmarabout Nov 27, 2023

gmarabout Nov 27, 2023 •

edited

Loading

toan-quach Nov 28, 2023 •

edited

Loading

trgiangdo Nov 28, 2023

gmarabout Nov 28, 2023

toan-quach Nov 30, 2023

gmarabout Nov 30, 2023

jrobinAV Nov 28, 2023

toan-quach Nov 30, 2023

jrobinAV commented Dec 4, 2023

toan-quach commented Dec 4, 2023

Feature/new update submission status algorithm #832

Feature/new update submission status algorithm #832

Conversation

toan-quach commented Nov 27, 2023 • edited Loading

Purpose:

Changes:

Remaing tasks:

Result

------------ Old version ------------

------------ New version ------------

github-actions bot commented Nov 27, 2023 • edited Loading

☂️ Python Cov

Overall Coverage

New Files

Modified Files

gmarabout left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmarabout Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

toan-quach Nov 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrobinAV commented Dec 4, 2023

toan-quach commented Dec 4, 2023

toan-quach commented Nov 27, 2023 •

edited

Loading

github-actions bot commented Nov 27, 2023 •

edited

Loading

gmarabout Nov 27, 2023 •

edited

Loading

toan-quach Nov 28, 2023 •

edited

Loading