Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: rework aged authentication config interface #366

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

comtalyst
Copy link
Collaborator

@comtalyst comtalyst commented May 22, 2024

Fixes #

Description
Part of #224

How was this change tested?

  • Unit tests
  • E2E OUTDATED, TODO
  • E2E for NAP (using managed identity)

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

- Deprecate `ARM_USE_CREDENTIAL_FROM_ENVIRONMENT` and `ARM_USE_MANAGED_IDENTITY_EXTENSION` in favor of track2-based authentication: https://github.com/Azure/azure-sdk-for-go/blob/main/sdk/azidentity/README.md#defaultazurecredential
- Rename `ARM_USER_ASSIGNED_IDENTITY_ID` to `ARM_KUBELET_IDENTITY_CLIENT_ID` for accuracy to its purpose/use

@comtalyst comtalyst requested a review from tallaxes May 22, 2024 19:01
@comtalyst comtalyst force-pushed the comtalyst/auth-interface-rework branch from a56a078 to 852da05 Compare May 22, 2024 19:02
Copy link
Collaborator Author

@comtalyst comtalyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@coveralls
Copy link

coveralls commented May 22, 2024

Pull Request Test Coverage Report for Build 10869172682

Details

  • 41 of 48 (85.42%) changed or added relevant lines in 7 files are covered.
  • 6 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.2%) to 97.899%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/providers/instance/azure_client.go 0 2 0.0%
pkg/providers/instance/skuclient/skuclient.go 0 5 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/auth/util.go 2 0.0%
pkg/auth/config.go 4 75.86%
Totals Coverage Status
Change from base Build 10782405541: 0.2%
Covered Lines: 36254
Relevant Lines: 37032

💛 - Coveralls

pkg/auth/config.go Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@comtalyst comtalyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@comtalyst comtalyst force-pushed the comtalyst/auth-interface-rework branch from 214bbde to c3615b8 Compare June 7, 2024 00:14
Copy link
Collaborator Author

@comtalyst comtalyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

Copy link
Collaborator

@tallaxes tallaxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before commenting on the appropriate configuration surface, and corresponding implementation, I would like to see a clear description of which authentication methods we want to support - and if we have a good reason to constrain them. Is there anything that prevents us from just using DefaultAzureCredential, which attempts a variety of authentication mechanisms (including the ones we need to support), using its existing/documented configuration surface? Would be good to avoid custom switches (like ARM_AUTH_METHOD) if the "standard"/OOB works well.

pkg/auth/cred.go Outdated
Comment on lines 33 to 35
if cfg.ArmAuthMethod == authMethodWorkloadIdentity {
klog.V(2).Infoln("cred: using workload identity for new credential")
return azidentity.NewDefaultAzureCredential(nil)
Copy link
Collaborator

@tallaxes tallaxes Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"using workload identity" - this is not quite what NewDefaultAzureCredential does, it will attempt a variety of mechanisms. If we are Ok with this - which may well be the case - let's just say it, and we don't need to special case Managed Identity below. If we want to constrain this to Workload Identity - needs to use NewWorkloadIdentityCredential?

Copy link
Collaborator Author

@comtalyst comtalyst Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a discussion related to this with @charliedmcb in another PR. I think our understanding match yours, with the new knowledge here is that NewDefaultAzureCredential() might cover managed identity as well.

So far, I think that we should scope it down to workload identity for simplicity. This means we will support only two auth methods for now: workload identity and system assigned MSI.
Before this PR (and in the current iteration you are seeing), with this NewDefaultAzureCredential(), it looks like we are supporting a wider scope, which I don't think we need to. There are "shortcuts" where ARM_AUTH_METHOD is workload identity, but they ended up using something else that's compatible with this implementation.

So, I think I will make changes to NewWorkloadIdentityCredential(), if possible.
Maybe we could consider formally using/supporting NewDefaultAzureCredential() later on if this module grows/we don't want to contain much knowledge on authentication.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by below; could you provide more example?

Would be good to avoid custom switches (like ARM_AUTH_METHOD) if the "standard"/OOB works well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that azidentity already provides (via environment variables) an existing configuration surface, and (via DefaultAzureCredential) a reasonable sequence of auth methods to try, including the two that we need to support. We should have good reasons to a) constrain the auth methods, and b) introduce additional/custom configuration surface, such as the proposed ARM_AUTH_METHOD setting. (Which we may need to remove later if decide to use NewDefaultAzureCredential? Removal is always hard ...) There may be valid reasons for both (improved security posture, the built-in sequence not meeting our order requirements, established pattern elsewhere, etc.) - but would be good to have clarity on which ones - if any - apply here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more option to consider - if we decide there is a good reason to support only workload identity and MSI - is to just try them in order (similar to how DefaultAzureCredential does it), eliminating the need (and capability) for the explicit choice via ARM_AUTH_METHOD.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be for the simplest secure approach that supports the requirements, which seems to be DefaultAzureCredential. (Any changes there will likely require additional configuration, and are very unlikely to break existing auth scenarios. In NAP we control the configuration tightly enough for this not to be a concern, for self-hosted breaking changes, if any, would/should surface via release notes.) Second best would be to try a more controlled sequence of auth methods we need to support - unless you see a problem with this, and we have to specify the method explicitly via configuration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed DefaultAzureCredential already encapsulates what we need. I don't see them breaking it either as its wildly used in many projects for wrapping many types of auth based on some quick github code searching.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any changes there will likely require additional configuration, and are very unlikely to break existing auth scenarios

I don't see them breaking it either

I doubt it will break too, plus the tests will catch it anyway in case we neglect release notes.
But my point is it might have new auth methods we are not aware of, which means more liability as I mentioned. This will not happen in the more controlled approach (i.e., NewWorkloadIdentityCredential with ARM_AUTH_METHOD)

In NAP we control the configuration tightly enough for this not to be a concern

Yep all these concerns are for self-hosted. In NAP we have static integration interface.

Copy link
Collaborator Author

@comtalyst comtalyst Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe I am overthinking again. The extra auth method might not add that much additional cost to us compared what DefaultAzureCredential will save us.
We could try going with implicit auth method and use DefaultAzureCredential and see how it goes.

The only problem, if it is at all, is autorest_auth.go is also using it. But I just haven't take a look at that deeply yet. If we are moving in this direction then I will see the possibility.

Copy link
Collaborator Author

@comtalyst comtalyst Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back to this.

  • The concern of unwanted auth method is valid, from an example in Cluster Autoscaler. "Requirement" to continue the support is an engineering burden.
  • Although, this DefaultAzureCredential is something standardized. If we are going to rely on this standardization (azure-sdk-for-go) for a next while, the simplicity we get might worth it, and the deprecation (if necessary) might not be too difficult.

TL;DR let's just use DefaultAzureCredential.

@tallaxes tallaxes added area/security Issues or PRs related to security area/global-config Issues or PRs related global configuration labels Jun 12, 2024
@comtalyst comtalyst force-pushed the comtalyst/auth-interface-rework branch from c3615b8 to 584d2bf Compare September 15, 2024 07:31
@@ -169,7 +170,7 @@ func getCABundle(restConfig *rest.Config) (*string, error) {
}

func getVnetGUID(cfg *auth.Config, subnetID string) (string, error) {
creds, err := auth.NewCredential(cfg)
creds, err := azidentity.NewDefaultAzureCredential(nil)
Copy link
Collaborator Author

@comtalyst comtalyst Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to keep the abstraction, but just return azidentity.NewDefaultAzureCredential(nil). But there are only 2 uses (another in azure_client.go) at the moment, so I don't think it is significant. Could refactor later.

return
}
authorizer := azidext.NewTokenCredentialAdapter(cred, []string{azidext.DefaultManagementScope})
Copy link
Collaborator Author

@comtalyst comtalyst Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to keep the abstraction, that returns authorizer. But this is the only use of it at the moment, so I don't think it is significant. Could refactor later.

Copy link
Collaborator Author

@comtalyst comtalyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

Copy link
Collaborator Author

@comtalyst comtalyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

cfg.TenantID = strings.TrimSpace(os.Getenv("ARM_TENANT_ID"))
cfg.SubscriptionID = strings.TrimSpace(os.Getenv("ARM_SUBSCRIPTION_ID"))
cfg.VMType = strings.ToLower(os.Getenv("ARM_VM_TYPE"))
cfg.ClusterName = strings.TrimSpace(os.Getenv("AZURE_CLUSTER_NAME"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap variables, mostly

authorizer, err := auth.NewAuthorizer(sc.cfg, sc.env)
// Create a new authorizer for the sku client
// TODO (charliedmcb): need to get track 2 support for the skewer API
cred, err := azidentity.NewDefaultAzureCredential(nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you test this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the code of auth.NewAuthorizer(sc.cfg, sc.env), with changes to use azidentity.NewDefaultAzureCredential(nil).
E2E for each auth method should be sufficient.

Copy link
Collaborator

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run e2e tests please before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/global-config Issues or PRs related global configuration area/security Issues or PRs related to security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants