Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 lifecycle rules are deserialized incorrectly when scoped to the entire bucket #2874

Open
object-Object opened this issue Aug 2, 2023 · 7 comments
Assignees
Labels
bug This issue is a bug. service-api This issue is due to a problem in a service API, not the SDK implementation.

Comments

@object-Object
Copy link

object-Object commented Aug 2, 2023

Describe the bug

GetBucketLifecycleConfiguration incorrectly deserializes lifecycle rules which are scoped to the entire bucket. This means that calling PutBucketLifecycleConfiguration with the rules from Get fails (this should be a no-op).

A global rule is represented with an empty Filter tag (eg. <Filter/>, {} in JSON). This has a different semantic meaning than a missing Filter tag (eg. null in JSON), which means the rule is instead using the legacy Prefix tag. Go deserializes both of these to nil.

Expected Behavior

Passing the output of GetBucketLifecycleConfiguration to the input of PutBucketLifecycleConfiguration should work.

Current Behavior

Received the following error:

MalformedXML: The XML you provided was not well-formed or did not validate against our published schema

GetBucketLifecycleConfiguration response XML:

<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Rule><ID>rule</ID><Filter/><Status>Enabled</Status><NoncurrentVersionExpiration><NoncurrentDays>30</NoncurrentDays></NoncurrentVersionExpiration></Rule></LifecycleConfiguration>

Deserialized Go object:

[{Status:Enabled AbortIncompleteMultipartUpload:<nil> Expiration:<nil> Filter:<nil> ID:0xc000056290 NoncurrentVersionExpiration:0xc000026190 NoncurrentVersionTransitions:[] Prefix:<nil> Transitions:[] noSmithyDocumentSerde:{}}]

PutBucketInventoryConfiguration request XML (note the missing <Filter/> tag):

<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Rule><ID>rule</ID><NoncurrentVersionExpiration><NoncurrentDays>30</NoncurrentDays></NoncurrentVersionExpiration><Status>Enabled</Status></Rule></LifecycleConfiguration>

Reproduction Steps

First, manually add a lifecycle rule to an S3 bucket. Select Choose a rule scope > Apply to all objects in the bucket, and any action type.

Then, run this code, replacing BUCKET-NAME with the appropriate value.

package main

import (
	"context"
	"fmt"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/service/s3"
	"github.com/aws/aws-sdk-go-v2/service/s3/types"
)

func main() {
	ctx := context.Background()
	cfg, _ := config.LoadDefaultConfig(ctx, config.WithClientLogMode(aws.LogRequestWithBody|aws.LogResponseWithBody))
	client := s3.NewFromConfig(cfg)

	// get the lifecycle rules from the bucket
	bucketName := "BUCKET-NAME"
	response, _ := client.GetBucketLifecycleConfiguration(ctx, &s3.GetBucketLifecycleConfigurationInput{
		Bucket: aws.String(bucketName),
	})

	// put the rules to the bucket without modification
	_, err := client.PutBucketLifecycleConfiguration(ctx, &s3.PutBucketLifecycleConfigurationInput{
		Bucket: aws.String(bucketName),
		LifecycleConfiguration: &types.BucketLifecycleConfiguration{
			Rules: response.Rules,
		},
	})
	fmt.Println(err)
}

Possible Solution

Ideally there should be another LifecycleRuleFilterMember type for an empty filter. As a workaround, an empty LifecycleRuleFilterMemberPrefix instance seems to be accepted by the API:

response.Rules[0].Filter = &types.LifecycleRuleFilterMemberPrefix{}

This serializes to the following XML:

<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Rule><Filter><Prefix></Prefix></Filter><ID>rule</ID><NoncurrentVersionExpiration><NoncurrentDays>30</NoncurrentDays></NoncurrentVersionExpiration><Status>Enabled</Status></Rule></LifecycleConfiguration>

Additional Information/Context

Here's the equivalent Python code with Boto3. This works as expected.

import boto3

client = boto3.client("s3")

bucket_name = "BUCKET-NAME"
response = client.get_bucket_lifecycle_configuration(Bucket=bucket_name)

client.put_bucket_lifecycle_configuration(
    Bucket=bucket_name,
    LifecycleConfiguration={"Rules": response["Rules"]},
)

Possibly related: #2162, #1941, #1722

AWS Go SDK V2 Module Versions Used

github.com/aws/aws-sdk-go-v2 v1.20.0
github.com/aws/aws-sdk-go-v2/config v1.18.32
github.com/aws/aws-sdk-go-v2/service/s3 v1.38.1
github.com/aws/aws-sdk-go-v2/service/sfn v1.19.1
github.com/aws/aws-sdk-go-v2/service/sqs v1.24.1
github.com/aws/smithy-go v1.14.0

Compiler and Version used

go version go1.20.5 windows/amd64

Operating System and version

Windows Server 2016 Datacenter 10.0.14393 Build 14393 (Amazon Workspaces)

@object-Object object-Object added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 2, 2023
@RanVaknin RanVaknin self-assigned this Aug 2, 2023
@RanVaknin
Copy link
Contributor

Hi @object-Object ,

Thanks for opening the issue and for your in-depth reproduction steps.
I remember answering questions about this in the past, and even now this is so confusing to me.

I believe this might be an issue with the S3 service itself sending back XML response that do not adhere to their own schema.
According to their own documentation:

Filter

The Filter is used to identify objects that a Lifecycle Rule applies to. A Filter must have exactly one of Prefix, Tag, or And specified. Filter is required if the LifecycleRule does not contain a Prefix element.

Type: LifecycleRuleFilter data type

Required: No

So assuming your repro instructions entail putting a lifecycle rule through the console, making a get call to that rule will result in an empty <Filter/> tag being returned with the rule, which is contradictory to the docs:

<LifecycleConfiguration
	xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
	<Rule>
		<ID>rule</ID>
		<Filter/> <!--Problem is here-->
		<Status>Enabled</Status>
		<NoncurrentVersionExpiration>
			<NoncurrentDays>30</NoncurrentDays>
		</NoncurrentVersionExpiration>
	</Rule>
</LifecycleConfiguration>

This seems to me like an XML that doesn't conform to the service's schema. Even if I wanted to put in that same lifecycle rule through the SDK, its not possible because when sending the request without the Filter param the request gets rejected: "api error MalformedXML: The XML you provided was not well-formed or did not validate against our published schema"

and there are no corresponding types that fit an empty filter tag:
image

Based on the docs, the correct way to put a lifecycle rule is to specify a Filter argument with one of those filters nested: AND or TAG or PREFIX :

	_, err = client.PutBucketLifecycleConfiguration(context.TODO(), &s3.PutBucketLifecycleConfigurationInput{
		Bucket: &BUCKET_NAME,
		LifecycleConfiguration: &types.BucketLifecycleConfiguration{
			Rules: []types.LifecycleRule{
				{
					ID:     aws.String("foo"),
					Status: types.ExpirationStatusEnabled,
					Filter: &types.LifecycleRuleFilterMemberPrefix{},
					NoncurrentVersionExpiration: &types.NoncurrentVersionExpiration{
						NoncurrentDays: 30,
					},
				},
			},
		},
	})

This will result in the correct structure being sent (which later can be PUT back as a no-op like you mentioned:

<LifecycleConfiguration
	xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
	<Rule>
		<ID>foo</ID>
		<Filter>
			<Prefix></Prefix>
		</Filter>
		<Status>Enabled</Status>
		<NoncurrentVersionExpiration>
			<NoncurrentDays>30</NoncurrentDays>
		</NoncurrentVersionExpiration>
	</Rule>
</LifecycleConfiguration>

This is just my initial observation, while I work with the team to verify this speculation, my workaround to you is to create lifecycle rules through the SDK and not through the console as it appears the service correctly validates requests sent by the SDK.

Happy to hear your thoughts on the matter.

Thanks again,
Ran~

@RanVaknin RanVaknin added needs-review This issue or pull request needs review from a core team member. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 3, 2023
@object-Object
Copy link
Author

object-Object commented Aug 3, 2023

Hi @RanVaknin, thanks for the quick response.

Even if I wanted to put in that same lifecycle rule through the SDK, its not possible because when sending the request without the Filter param the request gets rejected

I think this is the core of the issue. The Go SDK seems to be missing a type for an empty Filter. (These examples will use Python, as I think it's not possible to send the valid request from Go.)

This request is equivalent to sending the request without the Filter param in Go. This returns the same MalformedXML error as above:

client.put_bucket_lifecycle_configuration(
    Bucket=bucket_name,
    LifecycleConfiguration={
        "Rules": [
            {
                "ID": "fail",
                "Status": "Enabled",
                "NoncurrentVersionExpiration": {"NoncurrentDays": 30},
            }
        ]
    },
)

This sends the following XML to S3 (via boto3.set_stream_logger("botocore.endpoint")):

<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Rule>
        <ID>fail</ID>
        <Status>Enabled</Status>
        <NoncurrentVersionExpiration>
            <NoncurrentDays>30</NoncurrentDays>
        </NoncurrentVersionExpiration>
    </Rule>
</LifecycleConfiguration>

On the other hand, this request successfully sets the bucket's lifecycle configuration:

client.put_bucket_lifecycle_configuration(
    Bucket=bucket_name,
    LifecycleConfiguration={
        "Rules": [
            {
                "ID": "pass",
                "Status": "Enabled",
                "NoncurrentVersionExpiration": {"NoncurrentDays": 30},
                "Filter": {}, # this line is new
            }
        ]
    },
)

This sends the following XML to S3:

<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Rule>
        <ID>pass</ID>
        <Status>Enabled</Status>
        <NoncurrentVersionExpiration>
            <NoncurrentDays>30</NoncurrentDays>
        </NoncurrentVersionExpiration>
        <Filter /> <!-- this line is new -->
    </Rule>
</LifecycleConfiguration>

This doesn't follow the schema as documented, but it's both accepted here and returned from GetBucketLifecycleConfiguration, so I do think this is a bug with the Go SDK.

Hopefully this helps clarify the issue. Thanks again.


I believe this might be an issue with the S3 service itself sending back XML response that do not adhere to their own schema.

As a side note: it seems that schema is out of date anyway, since it doesn't include the ObjectSizeGreaterThan or ObjectSizeLessThan filters.

@RanVaknin
Copy link
Contributor

Hi @object-Object ,

Thanks for the follow up.

I tested this with the JS SDK v3, and ran into a similar issue:

async function run(){
    try {
        const res = await client.send(new PutBucketLifecycleConfigurationCommand({
            Bucket: bucket_name,
            LifecycleConfiguration: {
                Rules: [
                    {
                        ID: "pass",
                        Status: "Enabled",
                        NoncurrentVersionExpiration: { NoncurrentDays: 30 },
                        Filter: {},
                    }
                ]
            }
        }))
        console.log(res)
    } catch (error) {
        console.log(error)
    }
    
}
// TypeError: Cannot read properties of undefined (reading '0')

After looking at the embedded Smithy API model that the newer SDKs use to generate the clients I can see that LifecycleFilterRule is defined as a union type.
In Smithy, a union type is a data structure that can hold exactly one of its member values at a time. This is in line with the S3 documentation that states that the Filter must have exactly one of Prefix, Tag, And (and recently added ObjectSizeGreaterThan, or ObjectSizeLessThan) specfieid.

"com.amazonaws.s3#LifecycleRuleFilter": {
            "type": "union",
            "members": {
                "Prefix": {
                    "target": "com.amazonaws.s3#Prefix",
                    "traits": {
                        "smithy.api#documentation": "<p>Prefix identifying one or more objects to which the rule applies.</p>\n         <important>\n            <p>Replacement must be made for object keys containing special characters (such as carriage returns) when using \n         XML requests. For more information, see <a href=\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-xml-related-constraints\">\n            XML related object key constraints</a>.</p>\n         </important>"
                    }
                },
                "Tag": {
                    "target": "com.amazonaws.s3#Tag",
                    "traits": {
                        "smithy.api#documentation": "<p>This tag must exist in the object's tag set in order for the rule to apply.</p>"
                    }
                },
                "ObjectSizeGreaterThan": {
                    "target": "com.amazonaws.s3#ObjectSizeGreaterThanBytes",
                    "traits": {
                        "smithy.api#documentation": "<p>Minimum object size to which the rule applies.</p>"
                    }
                },
                "ObjectSizeLessThan": {
                    "target": "com.amazonaws.s3#ObjectSizeLessThanBytes",
                    "traits": {
                        "smithy.api#documentation": "<p>Maximum object size to which the rule applies.</p>"
                    }
                },
                "And": {
                    "target": "com.amazonaws.s3#LifecycleRuleAndOperator"
                }
            },
            "traits": {
                "smithy.api#documentation": "<p>The <code>Filter</code> is used to identify objects that a Lifecycle Rule applies to. A\n            <code>Filter</code> must have exactly one of <code>Prefix</code>, <code>Tag</code>, or\n            <code>And</code> specified.</p>"
            }
        },

If you didn't know, all the SDKs are auto-generated from the API models of every service. This means that the service team is the one defining their API model, and the our SDK is getting generated based on those requirements from the model.
Older SDKs like boto, Java, Ruby etc. are not Smithy based. Newer SDKs like Go v2 and JS v3 (and unreleased ones like Kotlin, Swift) are Smithy based which means that they hold information that is more closely tied to the internal model defined by the service team.

I'm not versed enough in the Python SDK to give you a conclusive answer, but to me, again this seems like either an oversight with how the service defined their model, or with how they enforce that model service side.

I'll discuss this with our wider team and see what we can come up with. This will likely get resolved by the service team itself.

Thanks for your patience.
Ran~

@object-Object
Copy link
Author

Thanks for the update. That all makes sense to me and I agree that it seems like a service-side issue now.

For some extra context if it helps, this is where boto's equivalent of that Smithy model is located. It's defined as a structure instead of a union, which I suppose is also technically inaccurate: https://github.com/boto/botocore/blob/680ba3002045d45452d58884767b1da730711707/botocore/data/s3/2006-03-01/service-2.json#L5940-L5962

Also, I found an old boto3 issue where someone reported this inaccuracy in the S3 docs, though it was never resolved: boto/boto3#2389

@object-Object
Copy link
Author

One more bit of extra context - it seems this behaviour is actually documented, in the S3 user guide on the page about lifecycle configuration elements:

You can specify an empty filter, in which case the rule applies to all objects in the bucket.

<LifecycleConfiguration>
    <Rule>
        <Filter>
        </Filter>
        <Status>Enabled</Status>
        transition/expiration actions.
    </Rule>
</LifecycleConfiguration>

@RanVaknin RanVaknin added service-api This issue is due to a problem in a service API, not the SDK implementation. and removed needs-review This issue or pull request needs review from a core team member. labels Aug 14, 2023
@RanVaknin RanVaknin transferred this issue from aws/aws-sdk-go-v2 Aug 14, 2023
@dsine-de
Copy link

dsine-de commented Feb 6, 2024

With the JavaScript SDK it won't work with Filter: {} or leaving the property out completely (malformed XML error), but Filter: {Prefix: ''} seems to work. This way it shows as "Unfiltered" in the AWS Console UI.

@kellertk
Copy link
Contributor

P126360061

@zshzbh zshzbh transferred this issue from aws/aws-sdk Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. service-api This issue is due to a problem in a service API, not the SDK implementation.
Projects
None yet
Development

No branches or pull requests

4 participants