Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollout dashboard fixes. #21

Merged
merged 1 commit into from
Aug 6, 2024
Merged

Rollout dashboard fixes. #21

merged 1 commit into from
Aug 6, 2024

Conversation

DFINITYManu
Copy link
Collaborator

Fix 1

When rollout dashboard asks for a rollout's task instances, rollout dashboard attempts to retrieve the tasks in a paged way to avoid overcoming Airflow's baked-in 100-item limit. For some reason, when tasks are retrieved, the result is unordered, so when page 2, 3, 4 of task instances are fetched, some task instances are missing and some task instances are duplicated. apache/airflow#41283 explains the issue in detail. The impact to us is that, sometimes, subnets have no associated state, despite the fact that Airflow clearly has progressed on that subnet.

To mitigate this, we transgress the limitation and ask for 500 task instances. This seems to work fine, much to my surprise! With that, the issue of rollout subnets sometimes not having any associated state is fixed.

Fix 2

The URL is always present in the rollout data structure, so stop pretending it is not, and remove unnecessary conditional.

Fix 3

Subnet state icons gain a hyperlink to the pertinent running, failed, or last completed Airflow task instance (specifically, to the log tab). This way, one can directly access the log by clicking on the subnet state icon.

This improves usability of the dashboard and speed to diagnose an issue that may be occurring during a rollout.

**Fix 1**

When rollout dashboard asks for a rollout's task instances, rollout dashboard
attempts to retrieve the tasks in a paged way to avoid overcoming Airflow's
baked-in 100-item limit.  For some reason, when tasks are retrieved, the
result is unordered, so when page 2, 3, 4 of task instances are fetched, some
task instances are missing and some task instances are duplicated.
apache/airflow#41283 explains the issue in detail.
The impact to us is that, sometimes, subnets have no associated state, despite
the fact that Airflow clearly has progressed on that subnet.

To mitigate this, we transgress the limitation and ask for 500 task instances.
This seems to work fine, much to my surprise!  With that, the issue of rollout
subnets sometimes not having any associated state is fixed.

**Fix 2**

The URL is always present in the rollout data structure, so stop pretending
it is not, and remove unnecessary conditional.

**Fix 3**

Subnet state icons gain a hyperlink to the pertinent running, failed, or last
completed Airflow task instance (specifically, to the log tab).  This way, one
can directly access the log by clicking on the subnet state icon.

This improves usability of the dashboard and speed to diagnose an issue that
may be occurring during a rollout.
@DFINITYManu DFINITYManu requested a review from a team as a code owner August 6, 2024 13:40
@DFINITYManu
Copy link
Collaborator Author

Thanks @LittleChimera -- stand by for rollout PR please.

@DFINITYManu DFINITYManu merged commit 7f00826 into main Aug 6, 2024
6 checks passed
@DFINITYManu DFINITYManu deleted the fixes branch August 6, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants