Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] correctly set task execution phase for terminal array node #5136

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions flytepropeller/pkg/controller/nodes/array/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@

eventRecorder := newArrayEventRecorder(nCtx.EventsRecorder())
messageCollector := errorcollector.NewErrorMessageCollector()

taskPhase := idlcore.TaskExecution_ABORTED
if arrayNodeState.Phase == v1alpha1.ArrayNodePhaseFailing {
taskPhase = idlcore.TaskExecution_FAILED
}
switch arrayNodeState.Phase {
case v1alpha1.ArrayNodePhaseExecuting, v1alpha1.ArrayNodePhaseFailing:
for i, nodePhaseUint64 := range arrayNodeState.SubNodePhases.GetItems() {
Expand Down Expand Up @@ -122,13 +127,12 @@
}

// update state for subNodes
if err := eventRecorder.finalize(ctx, nCtx, idlcore.TaskExecution_ABORTED, 0, a.eventConfig); err != nil {
if err := eventRecorder.finalize(ctx, nCtx, taskPhase, 0, a.eventConfig); err != nil {
// a task event with abort phase is already emitted when handling ArrayNodePhaseFailing
if eventsErr.IsAlreadyExists(err) {
return nil
if !eventsErr.IsAlreadyExists(err) {
logger.Errorf(ctx, "ArrayNode event recording failed: [%s]", err.Error())
return err

Check warning on line 134 in flytepropeller/pkg/controller/nodes/array/handler.go

View check run for this annotation

Codecov / codecov/patch

flytepropeller/pkg/controller/nodes/array/handler.go#L132-L134

Added lines #L132 - L134 were not covered by tests
Comment on lines +132 to +134
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider more graceful error handling

Consider handling the error case more gracefully by logging the error and continuing execution rather than returning early. The current implementation may cause unnecessary aborts.

Code suggestion
Check the AI-generated fix before applying
Suggested change
if !eventsErr.IsAlreadyExists(err) {
logger.Errorf(ctx, "ArrayNode event recording failed: [%s]", err.Error())
return err
if eventsErr.IsAlreadyExists(err) {
return nil
}

Code Review Run #e4c107


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

}
logger.Errorf(ctx, "ArrayNode event recording failed: [%s]", err.Error())
return err
}

return nil
Expand Down Expand Up @@ -466,6 +470,14 @@
return handler.UnknownTransition, err
}

// ensure task_execution set to failed - this should already be sent by the abort handler
if err := eventRecorder.finalize(ctx, nCtx, idlcore.TaskExecution_FAILED, 0, a.eventConfig); err != nil {
Comment on lines +473 to +474
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating eventRecorder before use

Consider checking if eventRecorder is not nil before calling finalize(). The current code assumes eventRecorder is always initialized.

Code suggestion
Check the AI-generated fix before applying
Suggested change
// ensure task_execution set to failed - this should already be sent by the abort handler
if err := eventRecorder.finalize(ctx, nCtx, idlcore.TaskExecution_FAILED, 0, a.eventConfig); err != nil {
// ensure task_execution set to failed - this should already be sent by the abort handler
if eventRecorder == nil {
logger.Errorf(ctx, "ArrayNode eventRecorder is nil")
return handler.UnknownTransition, fmt.Errorf("eventRecorder is nil")
}
if err := eventRecorder.finalize(ctx, nCtx, idlcore.TaskExecution_FAILED, 0, a.eventConfig); err != nil {

Code Review Run #e4c107


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

if !eventsErr.IsAlreadyExists(err) {
logger.Errorf(ctx, "ArrayNode event recording failed: [%s]", err.Error())
return handler.UnknownTransition, err
}

Check warning on line 478 in flytepropeller/pkg/controller/nodes/array/handler.go

View check run for this annotation

Codecov / codecov/patch

flytepropeller/pkg/controller/nodes/array/handler.go#L475-L478

Added lines #L475 - L478 were not covered by tests
}

// fail with reported error if one exists
if arrayNodeState.Error != nil {
return handler.DoTransition(handler.TransitionTypeEphemeral, handler.PhaseInfoFailureErr(arrayNodeState.Error, nil)), nil
Expand Down Expand Up @@ -609,6 +621,14 @@
return handler.UnknownTransition, err
}

// ensure task_execution set to succeeded
if err := eventRecorder.finalize(ctx, nCtx, idlcore.TaskExecution_SUCCEEDED, 0, a.eventConfig); err != nil {
if !eventsErr.IsAlreadyExists(err) {
logger.Errorf(ctx, "ArrayNode event recording failed: [%s]", err.Error())
return handler.UnknownTransition, err
}

Check warning on line 629 in flytepropeller/pkg/controller/nodes/array/handler.go

View check run for this annotation

Codecov / codecov/patch

flytepropeller/pkg/controller/nodes/array/handler.go#L626-L629

Added lines #L626 - L629 were not covered by tests
}

return handler.DoTransition(handler.TransitionTypeEphemeral, handler.PhaseInfoSuccess(
&handler.ExecutionInfo{
OutputInfo: &handler.OutputInfo{
Expand Down
Loading