Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch output batching and gzip compression support #967

Merged
merged 3 commits into from
Aug 29, 2024

Conversation

aleksmaus
Copy link
Contributor

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area outputs

What this PR does / why we need it:

Adding Elasticsearch output data batching and support for gzip compression, both are disabled by default.

This PR includes two previous PRs since it changing the same files and benefits from these previous changes:
#962
#966

Addition of batching and gzip compression improves the throughput dramatically, here is an example of throughput numbers in my environment, with Elastic cloud deployment:
Screenshot 2024-08-15 at 8 13 13 PM

Added this blob into the example confgiuration with explanation

  # enablecompression: # if true enables gzip compression for http requests (default: false)
  # batching: # batching configuration, improves throuput dramatically using _bulk Elasticsearch API
  #   enabled: true # if true enables batching
  #   batchsize: 5242880 # batch request size in bytes (default: 5 MB)
  #   flushinterval: 1s # batch fush interval (default: 1s)

The batch size and the flush interval are the knobs that allow to tune the throughput for the user specific usage.
For example if the user has very few alerts from one single instance of Falco he might not need batching.
The smaller batch size would work better for the cases where the incoming alerts size rarely hit the batch size limit, causing the data delay for up to 1 second by default (which also can be configured).
The larger batch size would show the better throughput with Elasticsearch in the cases where the number of incoming events is large.

Special notes for your reviewer:

The easiest way to review this addition is to look at the last commit only. The previous two commits are from the PRs that I mentioned above, because I worked on the same code and this feature would benefit from these previous changes.
If/when these previous PRs get approaved/merged I can update this PR to include only the changes for batching and compression.

I tried to not break and preserve the exiting logic, such as for example keeping the mapping failure handling the same.
I left the comment there about the reasons I didn't do the same for the mapping errors.
Everything should work as before if these new features are not enabled through the config.
I had to change the http client a bit in order to handle Elasticsearch batch/bulk responses that return 200 http status overall with the individual statuses in the bulk response items.
I'm open about changing defaults values, like changing the batching or concurrent requests defaults or enabling compression by default.
Let me know if you want to discuss the changes in person over zoom or email, might be easier to explain.

Thank you!

@poiana poiana requested review from fjogeleit and Issif August 16, 2024 00:34
@Issif Issif self-assigned this Aug 17, 2024
@Issif Issif added this to the 2.30 milestone Aug 17, 2024
@aleksmaus
Copy link
Contributor Author

This PR has conflicts right now, because the first out of three related PRs got merged yesterday. I'll resolve the conflicts after the second related PR gets reviewed and merged: #966. This should make this PR smaller, since currently it includes both related PRs commits.

Signed-off-by: Aleksandr Maus <aleksandr.maus@elastic.co>
@aleksmaus aleksmaus force-pushed the feature/elasticsearch_batching branch from 8727998 to 49b8054 Compare August 22, 2024 13:14
@aleksmaus
Copy link
Contributor Author

aleksmaus commented Aug 22, 2024

Trimmed down this PR now, removed two previous commits for the related PRs that got merged already. Now this only includes the batching for Elasticsearch output and gzip compression.

The default settings are:

	DefaultBatchSize     = 5 * 1024 * 1024 // 5 MB
	DefaultFlushInterval = time.Second

which seems to be optimal in my testing for the constant flow of events.
The trade off here is if you only get very few alerts/events within 1 sec time interval, while the payload size never hits the 5MB max batch size, then the events could be delays for up to 1 sec, for the next batch flush interval.

Let me know if if you want to set different defaults, and maybe we could document how the users can use these two knobs to tune the falcosidekick for the different kinds of use cases: few events per minute vs thousands of events in a sec.

Copy link
Member

@Issif Issif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Aleksandr Maus <aleksandr.maus@elastic.co>
@aleksmaus
Copy link
Contributor Author

Added a bit of documentation. Let me know if you want to change/add anything else there.

@aleksmaus
Copy link
Contributor Author

Morning @Issif! Any updates on this one?

Copy link
Member

@Issif Issif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything seems good to me, just a cosmetic change in the config_example.yaml, to let people uncomment and get a valid setting

config_example.yaml Outdated Show resolved Hide resolved
Co-authored-by: Thomas Labarussias <issif+github@gadz.org>
Signed-off-by: Aleksandr Maus <aleksmaus@gmail.com>
@aleksmaus
Copy link
Contributor Author

Thank you! Merged your config_example.yaml suggested changes.

@aleksmaus aleksmaus requested a review from Issif August 29, 2024 14:38
@poiana poiana added the lgtm label Aug 29, 2024
@poiana
Copy link

poiana commented Aug 29, 2024

LGTM label has been added.

Git tree hash: 791dd9871684cce4c97ac8fdf7625799138d1fdd

@poiana
Copy link

poiana commented Aug 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aleksmaus, Issif

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 067de26 into falcosecurity:master Aug 29, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants