A very simple lag exporter for Kafka topic partitions, where "lag" is defined as the difference between the last produced index and the last consumed index.
This implementation is intended to be run in a K8s-like environment, where pods are restarted on failure. Hence, non-retriable exceptions during runtime will make the application terminate.
The lag will be exported as a metric for Prometheus, like this:
k3a_consumergroup_group_lag{cluster_name="the-cluster", group="the-group", partition="0", topic="the-topic"} 0
Similarly, the current offset of each consumer group will be exported:
k3a_consumergroup_group_offset{cluster_name="the-cluster", group="the-group", partition="0", topic="the-topic"} 0
There's also a metric reporting the time spent polling the cluster:
k3a_lag_exporter_poll_time_ms{cluster_name="the-cluster"} 8
Example configuration:
k3a-lag-exporter {
poll-interval = 30 seconds
reporters.prometheus.port = 8000
reporters.prometheus.metric-namespace = k3a
clusters = [
{
name = "the-kafka-cluster"
topic-allow-list = [
"topic1"
"topic2"
]
topic-deny-list = [
".*secret-topic.*"
]
group-allow-list = [
"public-group.*"
]
group-deny-list = [
"internal-group.*"
]
bootstrap-servers = "kafka.example.com:9092"
consumer-properties = {
security.protocol = "SASL_SSL"
sasl.mechanism = "PLAIN"
sasl.jaas.config = "org.apache.kafka.common.security.plain.PlainLoginModule required username='"${USER}"' password='"${PASSWORD}"';"
}
admin-properties = {
security.protocol = SSL
ssl.keystore.type = "PKCS12"
ssl.keystore.location = "./foo.jks"
ssl.keystore.password = "password"
}
}
]
}
Configuration is handled by Typesafe Config. For default values, please see the fallback configuration file.
Element | Description |
---|---|
k3a-lag-exporter | The main configuration object. |
poll-interval | How often to poll the Kafka cluster. |
kafka-client-timeout | Timeout when querying Kafka API. |
reporters.prometheus.port | Port of built-in Prometheus web server. |
reporters.prometheus.metric-namespace | The prefix for Prometheus metrics. |
clusters | List of clusters to monitor. Currently, only a single cluster is supported. |
name | The cluster name. Will be used as a label in the exported metrics. |
topic-allow-list | Optional list of topics to include. See below. |
topic-deny-list | Optional list of topics to exclude. See below. |
group-allow-list | Optional list of consumer groups to include. See below. |
group-deny-list | Optional list of consumer groups to exclude. See below. |
bootstrap-servers | Kafka server(s) to connect to. |
consumer-properties | Properties allowing connection to the Kafka cluster as a consumer. Must have DESCRIBE permissions for the cluster, groups and topics. |
admin-properties | Properties allowing connection to the Kafka cluster as an admin. Must have DESCRIBE permissions for the cluster, groups and topics. |
The allow- and deny-lists contain regular expressions that are implicitly anchored to the beginning and the end of the string.
Filtering happens on the allow-list first, then the deny-list.
-
If an allow-list is given, only topics/consumer groups that match an entry in the list is passed on.
-
Otherwise, if no allow-list is given, every existing topic/consumer group is passed on to the deny-list.
-
If a deny-list is given, any matching topic/consumer group will be removed from the list from the previous steps.
-
Otherwise, if no deny-list is given, every topic/consumer group from the first to steps will be kept.
The image expects to have a configuration file available as
/app/k3a-lag-exporter.conf
. You may thus run it like this, giving
the full path to your local configuration file:
docker run -v $(pwd)/my.conf:/app/k3a-lag-exporter.conf ghcr.io/statnett/k3a-lag-exporter:latest