Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

provide ability to cache boto client instances directly and on S3Bucket #369

Merged
merged 33 commits into from
Jan 19, 2024

Conversation

zzstoatzz
Copy link
Contributor

@zzstoatzz zzstoatzz commented Jan 18, 2024

closes #332

This PR introduces an enhanced default caching strategy for boto client instances within our AwsCredentials and MinIOCredentials classes.

  • We now cache boto client instances by default using Python's lru_cache mechanism, keyed on the credentials context (ctx) and client type.
  • The cache key is derived from a comprehensive hash of the credentials object, including fields like AWS access keys, secret access keys, session tokens, profile names, and region names, along with parameters specific to each client type.
  • The __hash__ method of AwsCredentials and MinIOCredentials is based on the combination of these fields, ensuring that any changes to the credentials object results in a different hash and thus a new cache entry.
  • Thread safety via a global lock (_LOCK), which synchronizes access to the get_client function, preventing race conditions in multi-threaded environments.

thanks to @mattiamatrix for doing the bulk of this work

@zzstoatzz zzstoatzz changed the title Slight tweaks provide ability to cache boto client instances directly and on S3Bucket Jan 18, 2024
@zzstoatzz zzstoatzz marked this pull request as ready for review January 18, 2024 23:56
@zzstoatzz zzstoatzz requested a review from a team as a code owner January 18, 2024 23:56
prefect_aws/s3.py Outdated Show resolved Hide resolved
Copy link
Contributor

@chrisguidry chrisguidry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 428 to 435
cache_client: bool = Field(
default=False,
description=(
"If True, the S3 client will be cached. This is useful for "
"performance, but could cause issues if the S3 client is used "
"in multi-threaded environments."
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this toggle makes more sense on the AwsCredentials class since it's closer to where the client is instantiated and allows other consumers of AwsCredentials to use the toggle.

Also, do you know what issues client caching can cause in multi-threaded environments and if we can mitigate those issues? I'd prefer to not expose this toggle if we don't have to.



@pytest.mark.parametrize("client_type", [member.value for member in ClientType])
def test_get_client_cached(client_type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that ensures that changing some configuration on an AwsCredentials instance after instantiation and fetching a client causes a cache miss?

Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@zzstoatzz zzstoatzz merged commit f697760 into main Jan 19, 2024
6 checks passed
@zzstoatzz zzstoatzz deleted the slight-tweaks branch January 19, 2024 18:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

put_directory creates a new s3_client for each file uploaded
4 participants