Skip to content

Latest commit

 

History

History
 
 

partitioning

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Partitioning

PolarStreams stores data in partitions across a cluster, with each partition representing a subset of topic records. Partitioning, performed through a hashing function over the partition key, enables horizontal scalability and determines broker-to-consumer fan-out.

Each PolarStreams broker is assigned a token based on the ordinal index within the cluster, which will be used to determine the data that naturally belongs to that broker. A token range is subdivided into "consumer ranges". The amount of total partitions in a cluster is determined by the number of brokers, times the consumer ranges.

For example: a 3-broker cluster with 4 consumer ranges has 12 partitions and can have up to 12 consumers per group receiving data in parallel.

The design of partitions determined by tokens * consumer_ranges is well suited for scaling clusters: As the number of brokers increases, tokens also increase, supporting a larger number of consumers receiving data in parallel.

Going back to the previous example, when the cluster scales the number of brokers (e.g. via a HPA) from 3 to 6 due to high usage, the number of possible consumers automatically increases from 12 to 24.

Consumer Ranges

The consumer ranges determines the amount of partitions per broker. It's designed to answer the question of how many consumers will be able to read data in parallel per broker (broker-to-consumer fan-out).

Currently, it can be changed using POLAR_CONSUMER_RANGES environment variable (defaults to 4).

Hopefully in the future this setting could be defined at topic level.