This repository has been archived by the owner on Jan 2, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature/new update submission status algorithm #832
Feature/new update submission status algorithm #832
Changes from all commits
43bbf1b
c853d1b
5ee54dd
e6c4fcb
e90c436
a08bfa4
c3f2a21
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ What is the purpose of these additional
set
fields? As far as I can see, we add/remove job ids from it in_update_submission_status
but we never read from it!if it is just to track the statuses of the jobs, why not just using one single
Dict[JobId, Status]
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it's just something to hold if we still have jobs running, blocked, or pending or not. As we still need access to other job statuses, if we use a dict, we will need to loop through all job statuses to do this check, while if we keep each status to an equivalent set, we can quickly identify the existence of "running jobs" or "blocked jobs", etc. in O(1) time. Does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but is it used anywhere in the code? I can't find any...
If it is not used, why do we need those sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I have concerns about this implementation...
Let me elaborate, so you can tell me I shouldn't actually be worried :-)
By keeping such "states" in the object without any db persistence will make any Taipy-core service stateful. Example: a pod running in Kubernetes.
I see at least two problems now:
So either this state is important, and thus we need to somehow persist it, either it's not useful and we should just get rid of it.
let me know if we need to discuss it with more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm for now, from what I can understand, as long as orchestrator and dispatcher is on the same application, sharing the same memory pool, this shouldn't be an issue. Previously we also rely on this fact when initiating the callback from job to update the submission status:
But you made a very valid point, in the scenario where we want to scale up, separating the dispatcher to a separate application or even, a separate machine, this won't work anymore. I think we should discuss this point further :D and we should also involve @jrobinAV as I think this topic would be interesting for this ticket also https://app.zenhub.com/workspaces/taipy-613ef328affde6000ff77fb3/issues/gh/avaiga/taipy-enterprise/268
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that the dispatcher and orchestrator should always on the same machine, and we only split the jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trgiangdo In theory, if you consider Kubernetes as the deployment platform, you shouldn't make such assumptions. Services running on K8s must natively be stateless and restartable without breaking the app (the so-called "cloud native").
And if it can only run once (i.e. a single process that won't scale), then it must also be ready to be restarted on other nodes without notice (K8S does that). It means the state must be easily restored or recomputed. In this case, it would just be emptied...
I don't want to block the PR, but design choices imply things that go beyond the scope of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can close this topic for now.
To summarize:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can close the topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concern about the the possibility to call this method for the same job for 2 (or more) status changes in the wrong order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, the status of a job will be finished updated by the orchestrator before handing it over to the dispatcher so no update to job status shall create a collision. The only scenario this might happen is when we try to cancel a job as this action will be trigger directly from the main thread while the job could be running in the subthread. But we have various conditions to check if we can cancel a job or not, if it's running then we can't, if it's not running, I think this process can still be triggered safely when we update the status of job to
canceled