Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x](backport #41694) [aws] feat: aws-s3 input registry cleanup for untracked s3 objects #41936

Merged
merged 1 commit into from
Dec 6, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Dec 6, 2024

Proposed commit message

This PR partially addresses #39116 by introducing a registry cleanup strategy for aws-s3 input.

The cleanup implemented here removes registry entries if the s3 object is no longer available (aka tracked) when listing inside the polling lookup. The cleanup removes objects that are not tracked from both the local state and internal store (backed by the registry) to reduce the memory usage.

Note that, this only benefits when s3 objects get removed (ex:- using lifecycle policy) and are no longer available. There should be a follow-up for instances where such removal is not done at the bucket. For example, this could be done by,

  • Defining a time-based sliding window (ex:- only consider s3 objects of the past 3 days)
  • Only accept objects of a known storage class (ex:- S3 Standard ) so that users can archive objects 1

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • Build or run filebeat with aws-s3 input
  • Add s3 objects and observe memory growth with pprof (regex filter for awss3)
  • Remove s3 objects and observe memory reduction

Screenshots

Given below are pprof analyss comparisons for ~4K objects in registry and once they were clean up by removing S3 objects (emptying the bucket)

  • With ~4K objects processed

image

  • When S3 objects were removed

image

Related issues

Footnotes

  1. https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html


    This is an automatic backport of pull request [aws] feat: aws-s3 input registry cleanup for untracked s3 objects #41694 done by Mergify.

Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	x-pack/filebeat/input/awss3/states.go
#	x-pack/filebeat/input/awss3/states_test.go
(cherry picked from commit 583d345)
@mergify mergify bot requested a review from a team as a code owner December 6, 2024 16:39
@mergify mergify bot added the backport label Dec 6, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2024
@Kavindu-Dodan Kavindu-Dodan enabled auto-merge (squash) December 6, 2024 16:51
@Kavindu-Dodan Kavindu-Dodan added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Dec 6, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2024
@Kavindu-Dodan Kavindu-Dodan merged commit a6074e7 into 8.x Dec 6, 2024
22 checks passed
@Kavindu-Dodan Kavindu-Dodan deleted the mergify/bp/8.x/pr-41694 branch December 6, 2024 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants