[8.x](backport #41694) [aws] feat: aws-s3 input registry cleanup for untracked s3 objects #41936
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
This PR partially addresses #39116 by introducing a registry cleanup strategy for aws-s3 input.
The cleanup implemented here removes registry entries if the s3 object is no longer available (aka tracked) when listing inside the polling lookup. The cleanup removes objects that are not tracked from both the local state and internal store (backed by the registry) to reduce the memory usage.
Note that, this only benefits when s3 objects get removed (ex:- using lifecycle policy) and are no longer available. There should be a follow-up for instances where such removal is not done at the bucket. For example, this could be done by,
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Screenshots
Given below are pprof analyss comparisons for ~4K objects in registry and once they were clean up by removing S3 objects (emptying the bucket)
Related issues
Footnotes
https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html
This is an automatic backport of pull request [aws] feat: aws-s3 input registry cleanup for untracked s3 objects #41694 done by Mergify. ↩