Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

AWS S3 copy and move tasks and S3Bucket methods #316

Merged
merged 10 commits into from
Oct 10, 2023

Conversation

dominictarro
Copy link
Contributor

@dominictarro dominictarro commented Sep 27, 2023

Added

  • prefect_aws.s3.s3_copy
  • prefect_aws.s3.s3_move
  • prefect_aws.s3.S3Bucket.copy_object
  • prefect_aws.s3.S3Bucket.move_object

Closes #315

Example

from prefect_aws.s3 import s3_copy, s3_move

@flow
async def test_flow():
    await s3_copy(
        source_path="object",
        source_bucket_name="my-bucket",
        target_path="my-bucket/object",
        target_bucket_name="my-backup-bucket",
        aws_credentials=aws_credentials,
        aws_client_parameters=client_parameters,
    )
    await s3_move(
        source_path="object",
        source_bucket_name="my-bucket",
        target_path="archive/object",
        aws_credentials=aws_credentials,
        aws_client_parameters=client_parameters,
    )
from prefect_aws import S3Bucket

my_bucket = S3Bucket("my-bucket")
my_bucket.copy_object("critical/object", "my-bucket/object", target_bucket_name="my-backup-bucket")

my_bucket.move_object("object", "archive/object")

Screenshots

Checklist

  • References any related issue by including "Closes #" or "Closes ".
    • If no issue exists and your change is not a small fix, please create an issue first.
  • Includes tests or only affects documentation.
  • Passes pre-commit checks.
    • Run pre-commit install && pre-commit run --all locally for formatting and linting.
  • Includes screenshots of documentation updates.
    • Run mkdocs serve view documentation locally.
  • Summarizes PR's changes in CHANGELOG.md

@dominictarro dominictarro requested a review from a team as a code owner September 27, 2023 19:06
Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Let me know if you have any questions about any of my comments!

Comment on lines 1192 to 1194
"""Uses S3's internal
[CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html)
to copy objects within or between buckets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have a stream_from method on this class, but this method looks like it is different in subtle ways. Could you update the docstring to explicitly callout when you would use the function vs. stream_from?

Copy link
Contributor Author

@dominictarro dominictarro Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so the stream_from is good when there isn't a single set of credentials that can read from the source and write to the destination. The example I gave in #276 was with a public bucket like NOAA's GHCN-D bucket, which must be read via an unsigned request. Its shortcoming is that the object has to be downloaded to the Python process, then uploaded again. Sometimes that is necessary, but sometimes we can just use S3's internal copy operation, which completes large copies in seconds.

Will add context to the docstrings of both methods.

source_bucket_name: str,
aws_credentials: AwsCredentials,
target_bucket_name: Optional[str] = None,
aws_client_parameters: AwsClientParameters = AwsClientParameters(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set client parameters on a AwsCredntials object, so I think we can remove this parameter from the task.

source_bucket_name: str,
aws_credentials: AwsCredentials,
target_bucket_name: Optional[str] = None,
aws_client_parameters: AwsClientParameters = AwsClientParameters(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set client parameters on a AwsCredntials object, so I think we can remove this parameter from the task.

Comment on lines 229 to 231
s3_client = aws_credentials.get_boto3_session().client(
"s3", **aws_client_parameters.get_params_override()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend using get_client to ensure that the client parameters on aws_credentials are respected.

Suggested change
s3_client = aws_credentials.get_boto3_session().client(
"s3", **aws_client_parameters.get_params_override()
)
s3_client = aws_credentials.get_client("s3")

Comment on lines 279 to 281
s3_client = aws_credentials.get_boto3_session().client(
"s3", **aws_client_parameters.get_params_override()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend using get_client to ensure that the client parameters on aws_credentials are respected.

Suggested change
s3_client = aws_credentials.get_boto3_session().client(
"s3", **aws_client_parameters.get_params_override()
)
s3_client = aws_credentials.get_client("s3")

tests/test_s3.py Outdated
self,
s3_bucket_with_object: S3Bucket,
s3_bucket_2_empty: S3Bucket,
client_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the client_parameter fixture isn't used in this test.

tests/test_s3.py Outdated
def test_move_object_within_bucket(
self,
s3_bucket_with_object: S3Bucket,
client_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the client_parameter fixture isn't used in this test. Can it be removed?

tests/test_s3.py Outdated
def test_move_object_to_nonexistent_bucket_fails(
self,
s3_bucket_with_object: S3Bucket,
client_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the client_parameter fixture isn't used in this test. Can it be removed?

tests/test_s3.py Outdated
def test_move_object_onto_itself_fails(
self,
s3_bucket_with_object: S3Bucket,
client_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the client_parameter fixture isn't used in this test. Can it be removed?

tests/test_s3.py Outdated
self,
s3_bucket_with_object: S3Bucket,
s3_bucket_2_empty: S3Bucket,
client_parameters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the client_parameter fixture isn't used in this test. Can it be removed?

@dominictarro
Copy link
Contributor Author

@desertaxle

  • Added documentation to suggest when the native S3 operations should be used versus S3Bucket.stream_from.
  • client credentials are no longer passed directly to s3_copy nor s3_move
  • s3_copy and s3_move use AwsCredentials.get_s3_client
  • client_parameters direct fixture removed from test functions

Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the requested update @dominictarro! I left some comments on some fixtures that aren't being used, but once those are removed this PR is good to go!

s3_bucket_with_object.move_object("object", "object")
assert s3_bucket_with_object.read_path("object") == b"TEST"

@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed.

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)

with pytest.raises(ClientError):
assert s3_bucket_with_object.read_path("object") == b"TEST"

@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed.

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)

s3_bucket_with_object.copy_object("object", "object_copy_4", s3_bucket_2_empty)
assert s3_bucket_2_empty.read_path("object_copy_4") == b"TEST"

@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed.

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)

@@ -811,3 +975,71 @@ def test_upload_from_folder(
break
else:
raise AssertionError("Files did upload")

@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed.

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)

)
assert s3_bucket_with_object.read_path("object") == b"TEST"

@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed.

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients[-1:], indirect=True)

assert read(bucket, "object") == b"TEST"


@pytest.mark.parametrize("client_parameters", aws_clients, indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed.

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients, indirect=True)

read(bucket, "subfolder/object_copy")


@pytest.mark.parametrize("client_parameters", aws_clients, indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed.

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients, indirect=True)

assert read(bucket, "subfolder/new_object") == b"TEST"


@pytest.mark.parametrize("client_parameters", aws_clients, indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients, indirect=True)

@@ -205,6 +224,151 @@ async def test_flow():
assert output == b"NEW OBJECT"


@pytest.mark.parametrize("client_parameters", aws_clients, indirect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is unused and can be removed.

Suggested change
@pytest.mark.parametrize("client_parameters", aws_clients, indirect=True)

@dominictarro
Copy link
Contributor Author

dominictarro commented Oct 9, 2023

The fixture is required by the s3_mock fixture. Removing it throws an error like so

/Users/dominictarro/Documents/GitHub/prefect-aws/tests/test_s3.py::test_s3_copy failed with error: Test failed with exception
request = <SubRequest 'client_parameters' for <Function test_s3_copy>>
    @pytest.fixture
    def client_parameters(request):
>       client_parameters = request.param
E       AttributeError: 'SubRequest' object has no attribute 'param'
tests/test_s3.py:41: AttributeError

@desertaxle
Copy link
Member

Ah, that's right. I forgot about the logic that populates client_parameters. This looks good to me then!

@desertaxle desertaxle merged commit 9ba5424 into PrefectHQ:main Oct 10, 2023
6 checks passed
@dominictarro dominictarro deleted the copy_object branch December 10, 2023 22:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS S3 copy and move tasks and S3Bucket methods
2 participants