Sharded output from fluent-bit #8481

seveas · 2024-02-13T09:20:50Z

seveas
Feb 13, 2024

Introduction

I have created a sharded output plugin for fluent-bit. The initial goal was to create a way to load balance output among multiple kafka outputs. But instead of creating a new kafka output, or making something kafka-specific, I decided to implement this in a more generic way, allowing any output to be sharded. This requires a very small change to the fluent-bit core, everything else is implementable as a plugin.

I would appreciate feedback on the design and implementation of this, with the goal of merging at least the small core change into fluent-bit itself. I would like to merge the sharded output as well eventually, after finishing the features we want from it, but it could also be maintained in a fork if the functionality is not wanted upstream.

Configuration

To shard output among different outputs, first you define your output shards as normal outputs:

[OUTPUT]
	name  file
	path shard1.log
	alias shard.1

[OUTPUT]
	name  file
	path shard2.log
	alias shard.2

Then you define a sharded output

[OUTPUT]
	name sharded
	match *
	prefix shard.

The aliases for the shards must start with the prefix as configured for the sharded output. The shards should not match anything the sharded output also matches, as this will cause double output. In the example above, the sharded output matches everything, so the shards should match nothing.

How it works

At startup (in cb_pre_run), the sharded output plugin iterates over all defined outputs, and if the alias matches its prefix, it adds the output to its internal list of shards. At the end of this, it sets the "current" shard to the first shard in the list.

When the fluent-bit state machine creates a new task, it finds matching outputs. A small code change in the state machine was made and if the output selector selects a plugin marked with the FLB_PLUGIN_OUTPUT_INDIRECT flag, it calls the output's cb_get_output function to get the real output. I did not name this flag FLB_PLUGIN_OUTPUT_SHARDED as theoretically there could be more plugins that would like to determine outputs programmatically. The sharded output will return its currently selected output and then choose a new one. Currently that choice is a simple round robin, but see below.

The rest of the fluent-bit state machine does not notice the switcheroo that just took place, and the chunk of output gets flushed to the selected shard as normal.

Future shard selection algorithm improvements

The final algorithm I want to implement will have the following attributes:

It can use N of M configured shards, where N does not have to equal M
It can randomize the selection as well as pick shards in a round-robin way
It can skip shards that are currently not healthy (e.g. remote service down) and pick another shard (either from the N active ones, or a not-yet-active one)

Feedback mechanism

For that last point, the plugin will need to be able to find out if specific output instances are healthy. In the case of an output using the kafka plugin, it could check that plugin's blocked flag, but there doesn't seem to be a more generic way to get such information. The flb_output_instance struct does have a flush_list attribute that could be used to detect backpressure, but that does not work in multi-threaded mode. Any advice on how to tackle this would be welcome.

seveas · 2024-02-22T17:34:46Z

seveas
Feb 22, 2024
Author

Feedback from the community meeting:

Make this yaml-config only with the outputs being underneath the sharded output
Make sure the shards can not receive data except from the sharded output

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharded output from fluent-bit #8481

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Sharded output from fluent-bit #8481

seveas Feb 13, 2024

Introduction

Configuration

How it works

Future shard selection algorithm improvements

Feedback mechanism

Replies: 1 comment

seveas Feb 22, 2024 Author

seveas
Feb 13, 2024

seveas
Feb 22, 2024
Author