Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) #5702

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ddl-rliu
Copy link
Contributor

@ddl-rliu ddl-rliu commented Aug 28, 2024

Tracking issue

#5701

Why are the changes needed?

Today, the flyteadmin pod is blocked from starting up until the OIDC provider is healthy and available (the pod gets stuck in Error state). In some Kubernetes configurations, this erroring-pod could cause deployment-wide issues. The current behavior could be made more resilient.

(Note that this applies to configurations using useAuth=true)

What changes were proposed in this pull request?

A better approach in these configurations is to allow flyte to start up, even if the OIDC provider is unavailable. Then, try to re-initialize the OIDC provider later in the deployment lifespan. This is a more resilient approach, and it can be made configurable.

Adds an onlyStartIfOIDCIsAvailable config which controls this behavior.

How was this patch tested?

A writeup is here which shows the "good" flow when onlyStartIfOIDCIsAvailable is enabled and OIDC is unhealthy for a period: https://gist.github.com/ddl-rliu/4c09862404f46a5adbc451025160e0eb

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@ddl-rliu ddl-rliu changed the title x [Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) Aug 28, 2024
…dmin startup resiliency)

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
@ddl-rliu ddl-rliu force-pushed the rliu-insecure-oidc-issuer branch from e3a64b0 to 080a4cf Compare August 28, 2024 19:56
@@ -72,6 +72,7 @@ var (
"openid",
"profile",
},
OnlyStartIfOIDCIsAvailable: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configurable onlyStartIfOIDCIsAvailable is true by default. This matches the behavior today, where if OIDC is unavailable due to e.g. DNS issues, flyteadmin will not start.

For reference, another project which takes a similar approach to OIDC issues/service startup resiliency is https://github.com/juanfont/headscale/blob/main/hscontrol/app.go#L157-L166

}
}

return &authCtx, nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewAuthenticationContext now returns a Context pointer, this way we can add a reusable oidcProvider to the context, after transient OIDC errors are resolved.

Copy link

codecov bot commented Aug 28, 2024

Codecov Report

Attention: Patch coverage is 2.38095% with 41 lines in your changes missing coverage. Please review.

Project coverage is 36.17%. Comparing base (f075b34) to head (080a4cf).
Report is 70 commits behind head on master.

Files with missing lines Patch % Lines
flyteadmin/auth/auth_context.go 2.38% 41 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5702       +/-   ##
===========================================
- Coverage   60.92%   36.17%   -24.76%     
===========================================
  Files         796     1302      +506     
  Lines       51689   109627    +57938     
===========================================
+ Hits        31494    39660     +8166     
- Misses      17288    65822    +48534     
- Partials     2907     4145     +1238     
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (-17.95%) ⬇️
unittests-flyteadmin 55.29% <2.38%> (-3.44%) ⬇️
unittests-flytecopilot 12.17% <ø> (-5.62%) ⬇️
unittests-flytectl 62.18% <ø> (-5.24%) ⬇️
unittests-flyteidl 7.12% <ø> (-71.92%) ⬇️
unittests-flyteplugins 53.34% <ø> (-8.51%) ⬇️
unittests-flytepropeller 41.71% <ø> (-15.54%) ⬇️
unittests-flytestdlib 55.35% <ø> (-10.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -185,11 +198,20 @@ func NewAuthenticationContext(ctx context.Context, sm core.SecretManager, oauth2
authCtx.authServiceImpl = authMetadataService
authCtx.identityServiceIml = identityService

return authCtx, nil
err = authCtx.InitOIDC()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authContext logic which depend on OIDC (i.e. oidcProvider, oidcProvider.Endpoint) is extracted out into an initOIDC helper.

@eapolinario
Copy link
Contributor

@Sovietaced brings up a good point regarding this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Cold PRs
Development

Successfully merging this pull request may close these issues.

2 participants