Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS sts:AssumeRole stopped working with role/OrganizationAccountAccessRole in 1.30.x #16849

Open
vitaliyf opened this issue Sep 20, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@vitaliyf
Copy link
Contributor

vitaliyf commented Sep 20, 2024

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

Testing upgrade from Client version: 1.29.2 (git-v1.29.2) to Client version: 1.30.1 (git-v1.30.1)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

v1.29.9

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

kops_v1.30.1 update cluster - no other changes to manifest or environment, only executing newer kops binary.

5. What happened after the commands executed?

$ export AWS_PROFILE=company-name-dev3
$ kops_v1.30.1 update cluster

SDK 2024/09/20 14:31:06 DEBUG request failed with unretryable error https response error StatusCode: 403, RequestID: 623bd87e-11e1-4b06-9f16-10f60ba2f030, api error AccessDenied: User: arn:aws:sts::[redacted]006:assumed-role/OrganizationAccountAccessRole/aws-go-sdk-1726842666098977639 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::[redacted]006:role/OrganizationAccountAccessRole Error: error determining default DNS zone: error querying zones: error listing hosted zones: operation error Route 53: ListHostedZones, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 623bd87e-11e1-4b06-9f16-10f60ba2f030, api error AccessDenied: User: arn:aws:sts::[redacted]006:assumed-role/OrganizationAccountAccessRole/aws-go-sdk-1726842666098977639 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::[redacted]006:role/OrganizationAccountAccessRole

6. What did you expect to happen?

With kops-1.29.2 the output shows proposed changes that need to be applied with --yes

AWS CLI is able to successfully get Route53 zones from the same shell:

$ aws route53 list-hosted-zones
{
    "HostedZones": [
        {
            "Id": "/hostedzone/Z0[redacted]",
            "Name": "k8s.dev3.us-west-2.example.com.",
            "CallerReference": "8e483d8f-0d3c-4bcc-9c68-ecb4dea807ae",
            "Config": {
                "PrivateZone": false
            },
            "ResourceRecordSetCount": 8
        }
}

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

https://gist.github.com/vitaliyf/cfddd9ad771ee613ee850bb9e2d3fe14

9. Anything else do we need to know?

$ cat ~/.aws/config
[default]
region = us-west-2

[profile company-name]
aws_account_id = company-name
region = us-west-2
output = json
color = ff0000

[profile company-name-dev1]
role_arn = arn:aws:iam::[redacted]385:role/OrganizationAccountAccessRole
source_profile = company-name

[profile company-name-dev2]
role_arn = arn:aws:iam::[redacted]813:role/OrganizationAccountAccessRole
source_profile = company-name
color = 00ff00

[profile company-name-dev3]
role_arn = arn:aws:iam::[redacted]006:role/OrganizationAccountAccessRole
source_profile = company-name
color = 0000ff

This cluster has been continuously upgraded one kops/kubernetes version at a time for at least a couple years, so it is pretty routine for us to test and execute such upgrades in-place.

I tried to look around and I suspect this is related to aws-sdk-go-v2 upgrade.

For example, they have this issue: aws/aws-sdk-go-v2#2686 - and coincidentally or not, that ticket is referenced by cert-manager/cert-manager#7236 where they are also dealing with "Missing Region" error just like #16645 from kops-1.30.0

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 20, 2024
@vitaliyf
Copy link
Contributor Author

vitaliyf commented Sep 20, 2024

Workaround: use awsudo or other workarounds from https://kops.sigs.k8s.io/mfa/#the-workaround-2

$ awsudo company-name-dev3 kops_v1.30.1 update cluster

...
  	                    	+ NODEUP_URL_AMD64=https://artifacts.k8s.io/binaries/kops/1.30.1/linux/amd64/nodeup,https://github.com/kubernetes/kops/releases/download/v1.30.1/nodeup-linux-amd64
  	                    	- NODEUP_URL_AMD64=https://artifacts.k8s.io/binaries/kops/1.29.2/linux/amd64/nodeup,https://github.com/kubernetes/kops/releases/download/v1.29.2/nodeup-linux-amd64
...more as-expected output..

Must specify --yes to apply changes

@hakman
Copy link
Member

hakman commented Sep 22, 2024

FYI @rifelpet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants