Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting Client Options RetryMaxAttempts doesn't change number of retries #2251

Closed
kgeckhart opened this issue Aug 21, 2023 · 3 comments · Fixed by #2390
Closed

Setting Client Options RetryMaxAttempts doesn't change number of retries #2251

kgeckhart opened this issue Aug 21, 2023 · 3 comments · Fixed by #2390
Assignees
Labels
bug This issue is a bug. p3 This is a minor priority issue workaround-available

Comments

@kgeckhart
Copy link

Describe the bug

Creating a new client using the RetryMaxAttempts option like,

cloudwatch.NewFromConfig(*regionalConfig, func(options *cloudwatch.Options) {
    options.RetryMaxAttempts = 5
})

uses the default retry amount of 3 instead of the set value of 5

Expected Behavior

A retryable failed request to be retried 5 times

Current Behavior

We see logs which indicate the request was only retried the default 3 times instead of the expected 5,

operation error CloudWatch: ListMetrics, exceeded maximum number of attempts, 3, https response error StatusCode: 400

Reproduction Steps

I don't know of a great way to force a retryable error for an easy reproduction snippet but creating a client to just about any api using RetryMaxAttempts is likely to have this issue.

cloudwatch.NewFromConfig(*regionalConfig, func(options *cloudwatch.Options) {
    options.RetryMaxAttempts = 5
})

Possible Solution

Part of the problem appears to be this function definition,

func $finalizeResolveName:L(o *Options, client Client) {
if v := o.RetryMaxAttempts; v == 0 || v == client.options.RetryMaxAttempts {
return
}
o.Retryer = $withMaxAttempts:T(o.Retryer, o.RetryMaxAttempts)
}
""");

specifically the or case on line 380
if v := o.RetryMaxAttempts; v == 0 || v == client.options.RetryMaxAttempts {

It looks like the or condition is always true because the client is created using the retry settings from the original options. As far as I can tell when the client is created from the options nothing explicitly configures the retryer to use the RetryMaxAttempts and this or condition being skipped allows the setting to be ignored all together.

Additional Information/Context

No response

AWS Go SDK V2 Module Versions Used

	github.com/aws/aws-sdk-go v1.44.313
	github.com/aws/aws-sdk-go-v2 v1.20.0
	github.com/aws/aws-sdk-go-v2/config v1.18.31
	github.com/aws/aws-sdk-go-v2/credentials v1.13.30
	github.com/aws/aws-sdk-go-v2/service/amp v1.17.0
	github.com/aws/aws-sdk-go-v2/service/apigateway v1.17.0
	github.com/aws/aws-sdk-go-v2/service/apigatewayv2 v1.14.0
	github.com/aws/aws-sdk-go-v2/service/autoscaling v1.30.0
	github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.27.0
	github.com/aws/aws-sdk-go-v2/service/databasemigrationservice v1.28.0
	github.com/aws/aws-sdk-go-v2/service/ec2 v1.109.0
	github.com/aws/aws-sdk-go-v2/service/resourcegroupstaggingapi v1.15.0
	github.com/aws/aws-sdk-go-v2/service/shield v1.19.0
	github.com/aws/aws-sdk-go-v2/service/storagegateway v1.19.0
	github.com/aws/aws-sdk-go-v2/service/sts v1.21.0

Compiler and Version used

1.20.4

Operating System and version

macos 13.5

@RanVaknin
Copy link
Contributor

RanVaknin commented Aug 22, 2023

Hi @kgeckhart

Thanks for the feedback and the in depth analysis.

I have confirmed this is an issue with the following driver code:

package main

import (
	"bytes"
	"context"
	"fmt"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/service/cloudwatch"
	"io"
	"log"
	"net/http"
)

type MockTransport struct{}

func (t *MockTransport) RoundTrip(req *http.Request) (*http.Response, error) {
	response := &http.Response{
		StatusCode: 500,
		Body:       io.NopCloser(bytes.NewBufferString("Internal Server Error")),
		Header:     make(http.Header),
	}
	return response, nil
}

func main() {
	cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithRegion("us-east-1"))
	if err != nil {
		log.Fatalf("unable to load SDK config, %v", err)
	}

	cfg.HTTPClient = &http.Client{
		Transport: &MockTransport{},
	}

	client := cloudwatch.NewFromConfig(cfg, func(options *cloudwatch.Options) {
		options.RetryMaxAttempts = 5
	})

	out, err := client.ListMetrics(context.TODO(), &cloudwatch.ListMetricsInput{})
	if err != nil {
		panic(err)
	}

	fmt.Println(len(out.Metrics))
}

Will result in RetryMaxAttempts not being set correctly:

$ go run main.go
panic: operation error CloudWatch: ListMetrics, exceeded maximum number of attempts, 3, https response error StatusCode: 500, RequestID: , api error UnknownError: UnknownError

Setting the RetryMaxAttempts via the functional options on the API operation itself does work:

	out, err := client.ListMetrics(context.TODO(), &cloudwatch.ListMetricsInput{}, func(options *cloudwatch.Options) {
		options.RetryMaxAttempts = 5
	})

Results in the correct behavior:

$ go run main.go
panic: operation error CloudWatch: ListMetrics, exceeded maximum number of attempts, 5, https response error StatusCode: 500, RequestID: , api error UnknownError: UnknownError

Proposed solution:

Perhaps we need to introduce a new parameter in options. Something along the lines of APIRetryMaxAttempts with its own logic. If we do go this route, it probably needs to take precedence over client level config RetryMaxAttempts

@RanVaknin RanVaknin added needs-review This issue or pull request needs review from a core team member. p2 This is a standard priority issue workaround-available p3 This is a minor priority issue queued This issues is on the AWS team's backlog and removed needs-triage This issue or PR still needs to be triaged. needs-review This issue or pull request needs review from a core team member. p2 This is a standard priority issue labels Aug 22, 2023
@lucix-aws
Copy link
Contributor

This is happening because Options is bootstrapped like so:

resolveRetryer(&options)

// ... resolve other things       

for _, fn := range optFns { // optFns = ...func(*Options) param in New() and NewFromConfig()
    fn(&options)
}   
    
client := &Client{
    options: options,
}

It's within resolveRetryer that we apply RetryMaxAttempts from the passed-in options:

if v := o.RetryMaxAttempts; v != 0 {
    standardOptions = append(standardOptions, func(so *retry.StandardOptions) {
        so.MaxAttempts = v
    })
}

i.e. we do so BEFORE we apply caller mutations to Options. Any mutations to RetryMaxAttempts applied at client init are therefore dropped on the floor.

It works via aws.Config because that value makes it into Options before resolveRetryer, and it works via per-op Options mutation because there's a per-op resolver that correctly applies it today if changed (that's the code referenced in your description, and the same code that the now-closed #2253 attempted to change).

The solution here will be to defer the applying of options on the retryer to after we run caller Options mutation. We have codegen mechanisms for this in place today.

As an aside/clarification, the behavior of the generated code on line 380 highlighted above is intentional and correct. That check is correctly preventing the retryer from being re-created if the operation-specific Options mutations have not modified the value of RetryMaxAttempts. v == client.options.RetryMaxAttempts WILL be false if a per-op Options mutation has actually changed that value. Removing it would coincidentally solve the reported issue - but it wouldn't be fixing the underlying one.

Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@lucix-aws lucix-aws removed the queued This issues is on the AWS team's backlog label Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p3 This is a minor priority issue workaround-available
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants