-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aws] feat: aws-s3 input registry cleanup for untracked s3 objects #41694
[aws] feat: aws-s3 input registry cleanup for untracked s3 objects #41694
Conversation
Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services) |
c129de6
to
f394447
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @Kavindu-Dodan ! One quick question first, should we also introduce clean_removed
parameter just to be matching the filestream input?
For s3 input, if |
42ae2ec
to
fc03402
Compare
fc03402
to
9b29500
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Looking at the change, if I've correctly understood, we're removing the IDs from the registry once the files have been read. |
The cleanup happens for objects we no longer see in the configured bucket and they won't be processed so there won't be any elastic acknowledgment. Any known bucket entry will be kept in the registry and will be tracked for their success or failure. See this logic [1] which is unchanged. [1] - beats/x-pack/filebeat/input/awss3/s3_input.go Lines 185 to 193 in 9b29500
|
1d2dab9
to
0f0eee1
Compare
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go
b17324a
to
3c84e63
Compare
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345)
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345)
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345)
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345) Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345) Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345) Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Hello @Kavindu-Dodan - Update: I see the concept of |
@lucabelluccini correct, there's no setting and clean-up occur by default when the object is no longer visible in the listing. |
Proposed commit message
This PR partially addresses #39116 by introducing a registry cleanup strategy for aws-s3 input.
The cleanup implemented here removes registry entries if the s3 object is no longer available (aka tracked) when listing inside the polling lookup. The cleanup removes objects that are not tracked from both the local state and internal store (backed by the registry) to reduce the memory usage.
Note that, this only benefits when s3 objects get removed (ex:- using lifecycle policy) and are no longer available. There should be a follow-up for instances where such removal is not done at the bucket. For example, this could be done by,
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Screenshots
Given below are pprof analyss comparisons for ~4K objects in registry and once they were clean up by removing S3 objects (emptying the bucket)
Related issues
Footnotes
https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html ↩