Releases: cloudflare/ebpf_exporter
v2.4.2 / 2024-05-30
- Removed debug sleep on shutdown (#398)
- Detach programs with a trace on exit (#389)
- Enabled truly static linkage with no nss (#385)
- Switched to dynamic linking for tests (#414)
- Added decoder error counter (#386)
- Added a release archive job to CI (#383)
- Added a markdown link linter (#413)
- Fixed USDT example on ubuntu 24.04 (#416)
- Re-enabled
unix-socket-backlog
example checking in CI (#417) - Switched to
pkg-config
to findlibbpf
dependencies (#415) - Bumped dependencies to latest (#384, #387, #391, #394, #395, #396, #399, #400, #401, #402, #403, #404, #405, #406, #407, #409, #410, #418, #419, #420, #421, #422, #424, #429, #430, #431)
v2.4.1 / 2024-04-18
- Disabled cache for tracing labels, which was causing a memory leak (#363)
- Enabled
pprof
support (#364) - Added
errno
decoder (#378) - Added
padding
to label decoder to skipstruct
holes (#376) - Added
ext4dist
example (#365) - Added
xfsdist
example (#368) - Fixed
biolatency
example (#373, #374, #375) - Rewrote
sock-trace
and simplified example with socket cookies (#381) - Split
cachestat
into pre and post kernel 5.16 (#372) - Fixed tracing screenshot path in the README (#379)
- Added a helper to extract tracing propagation args (#358)
- Added probing for
/usr/share/hwdata/pci.ids
for RedHat/Fedora/CentOS (#380) - Added
sd_notify
support when running under systemd (#382) - Bumped dependencies to latest (#359, #360, #361, #366, #367)
v2.4.0 / 2024-02-27
This is a big release that comes with a major new feature: Distributed Tracing via OpenTelemetry (#297).
You can find the full documentation in ./tracing.
As a quick demo, you could run a demo locally with a provided Docker image:
- Run Jaeger
all-in-one
to provide an OpenTelemetry sink and UI:
docker run --rm -it --net host jaegertracing/all-in-one:1.54.0
-
Open Jaeger UI: http://localhost:16686/.
-
Build tracing demos from the root of the repo:
make tracing-demos
- Run
ebpf_exporter
with asock-trace
example from the root of the repo:
docker run --rm -it --privileged --net host -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 -v $(pwd)/tracing:/tracing ghcr.io/cloudflare/ebpf_exporter:v2.4.0 --config.dir=examples --config.names=sock-trace
- Run the demo:
./tracing/demos/sock/demo
-
Refresh the Jaeger UI and select
demo
as the service, click "Find Traces". -
Observe a trace that includes both userspace demo component produced spans and kernel spans produced with
ebpf_exporter
:
We have more examples bundled, please see the docs.
Tracing support required us to take a few dependencies that needed a newer Go version, so we bumped the build requirement from go1.18 to go1.20.
Other changes:
v2.3.3 / 2024-02-06
- Partition decoder cache by name (#346)
v2.3.2 / 2024-02-05
v2.3.1 / 2024-02-05
v2.3.0 / 2023-12-26
Highlights:
- Added support for
fanotify
for a faster and more reliable cgroup monitoring (#244, #263, #264, #265, #266, #279, #288) - Added builds with built-in libbpf (now preferred) and system provided libbpf (#286)
- Started publishing Docker images on GitHub (#271, #290, #291, #292, #293, #294, #295)
New examples:
- Added
icmp-ip
example withinet_ip
decoder (#251) - Added
pci_vendor
,pci_device
,pci_class
,pci_subclass
decoders with examples (#255, #274) - Added
kstack
decoder with an example (#313) - Added
unix-socket-backlog
example (#284) - Added
softirq-latency
example (#300, #304) - Added
softirq-latency-net-rx
example that's an array based version ofsoftirq-latency
(#310) - Added
cfs-throttling
example (#311) - Added
tcp-retransmit
example (#318, #335)
Changes to examples:
- Added
jsonschema
for examples and cleaned up unused keys (#314) - Added
exp2zero
histogram type for cases when 0 is a significant outcome and addedtcp-syn-backlog-exp2zero
example (#280) - Fixed
uint
decoder for very large numbers (#296) - Removed copy-pasted division by 50 in
tcp-syn-backlog-exp2zero
example (#301) - Added
increment_exp2zero_histogram
helper macro for examples (#302) - Added
{increment_map,increment_{exp2,ex2zero}_histogram}_nosync
helper macros (#303, #305) - Fixed
tcp-syn-backlog
example with linear histogram (#306) - Added example rebuild if any of the headers change (#307)
- Simplified header includes in examples (#312)
- Fixed
shrinklat
example failure due to wrongly sized key (#319) - Suppressed BTF warning in the
shrinklat
example due to type mismatch (#327) - Fixed
biolatency
kernel version check after an upstream LTS backport (#309)
Build changes:
- Bumped Go to 1.20 and dependencies to latest (#249, #258, #281, #282, #283, #289, #317, #333, #334)
- Added
build-dynamic
andbuild-static
make goals (#241) - Expanded linting from
golangci-lint
and fixed uncovered issues (#256, #259, #269, #270) - Added configuration loading checks for existing configs to CI (#322, #326)
- Added export of built examples in CI jobs to attach them to releases (#308, #325)
- Suppressed errors when building outside of a git repo (#242)
- Added checks for
libbpf
version on startup to prevent runtime errors (#247) - Clarified
libbpf
instructions (#262) - Started running tests with
-race
if available (#267) - Added checks that produced binaries work in CI (#268)
- Switched from
dbhi/qus/action
to more officialdocker/setup-qemu-action
for CI builds (#272) - Split Docker image into multiple variants:
ebpf_exporter
andebpf_exporter_with_examples
(#273) - Optimized
libbpf
dependencies in CI (#275) - Added clang-format output diff to CI failures (#328)
Other changes:
- Styling and typo fixes (#252, #276, #329)
- Added map value size validation to startup config checks (#257, #321, #322)
- Added
linguist
ignores forvmlinux.h
files that were screwing language stats (#248) - Added
.dockerignore
forlibbpf
and built examples (#298) - Removed unused
perf_event
from config definitions (#315) - Added support for external BTF information (#320, #323)
- Added a uprobe benchmark (it's slow!) (#331)
v2.2.0 / 2023-07-25
The best release yet! Syscalls, per-cpu maps, running with no elevated capabilities at runtime — it has it all.
- Added capability dropping and documented necessary capabilities (#231)
- Added support for systemd socket activation (#237)
- Added tracepoints and empty probes benchmark (#236)
- Added support for reading percpu maps (#226)
- Added support for XDP attachment with an example (#215, thanks @huseyinsaatci)
- Added syscall decoder with an example (#214, thanks @huseyinsaatci)
- Added udp receive packet drops example (#213, #229)
- Added
kfree_skb
example (#233, #234) - Simplified oomkill example (#230)
- Replaced tracepoints with
tp_btf
in examples to remove the need fortracefs
(#227) - Reduced libbpf logging unless
--debug
is enabled (#216) - Allowed suppressing timestamps in logs with
--log.no-timestamps
(#239) - Added
clang-format
config to enforce formatting on C code (#222) - Formatted examples uniformly (#228)
- Added default build goals to Makefiles (#225)
- Updated ubuntu in CI from 20.04 to 22.04 (#223)
- Updated
vmlinux.h
from 5.15.0-25 to 6.3.0-7 and generation instructions (#224) - Updated dependencies to latest (#197, #202, #203, #204, #205, #206, #207, #210, #211, #212, #218, #238)
v2.1.0 / 2023-03-23
- Enabled pre-aggregation for label sets to allow duplicate labels (#180)
- Added
tcp-window-clamp
example (#172) - Enabled passing
CFLAGS
to examples (#172) - Added a note about supported distros to README (#174)
- Updated module path to v2 to make Go happy (#177)
- Cleaned up
HistogramBucketType
(#178) - Added a link to libbpf program types and SEC names (#181)
- Switched to consistent indentation in examples (#194)
- Updated used bpf instruction set via
-mcpu
tov3
(#182) - Updated Go to 1.20 (#193, #195)
- Updated
golangci-lint
to latest (#192) - Updated dependencies to latest (#169, #173, #175, #176, #183, #187, #188, #190, #191, #195)
v2.0.0 / 2022-10-31
ebpf_exporter
v2 is here!
This release comes with a bunch of breaking changes (all for the better!), so be sure to read the release notes below.
First and foremost, we migrated from BCC to libbpf. BCC has served us well over the years, but it has a major drawback that it compiles eBPF programs at runtime, which requires a compiler, kernel headers and has a chance of failing due to kernel discrepancies between hosts and kernel versions. It was hard to do static linking with bcc, so we ended up providing a binary linked against an older libc, for which you had to provide your own libbcc (which could also break due to unstable ABI).
With libbpf
all these problems go away:
- Programs (now called configs) are compiled in advance, and for each config you have an eBPF ELF object and a yaml config describing how to extract metrics out of it.
- Thanks to
libbpf
andCO-RE
you canCO
mpile once andR
unE
verywhere, worrying less about runtime failures. - It's easy to statically compile in
libbpf
, so we now provide a statically compiled binary that you can use anywhere with no dependencies. We also have aDockerfile
in the repo (not yet published on Docker Hub) if you're inclined to use that, and it's easier to run than ever.
Big thanks to @wenlxie for doing a bulk of the work on porting to libbpf
in #130. Another big thanks to @aquasecurity for their work on libbpfgo
, which made it a lot easier for us to switch.
In BCC
repo itself there's an effort to migrate programs from BCC
to libbpf
and you can see it here:
The programs above can be used as an inspiration to what can ebpf_exporter
provide for you as a metric.
Now to config changes. Previously you needed to make one big yaml config with all your metric descriptions and metrics intermingled. Now each logical program is called a config (a .yaml
file) and each config has a dedicated eBPF ELF object (a .bpf.o
file compiled from a .bpf.c
file). When you start ebpf_exporter
, you need to give it the path to the directory with your configs and tell it which configs to load. This allowed us to greatly flatten and simplify the configs and it allows you to have a simpler tooling configuring what ebpf_exporter
should enable.
Having eBPF C code in separate files also allows you to use your regular tooling to build eBPF ELF objects. In examples
directory you'd find a collection of our example configs along with a Makefile
to build eBPF code. The expectation is that you would replicate something similar for your internal configs, and you all the needed bits and pieces provided for you to copy and adapt. We provide vmlinux.h
for both x86_64
(aka amd64) and aarch64
(aka arm64).
Having separate .bpf.o
allows you to compile not just C code, but anything that would provide a valid eBPF ELF object. We tried with Rust, but unsuccessfully. Please feel free to send a PR if you have better luck with it. We still expect that majority of the people would use plain old C, since that's what libbpf mainly supports and has a lot of examples for.
Since programs for configs need to compiled in advance, we compile them as a part of CI job, allowing to spot mistakes early.
You no longer need to describe how to attach your eBPF programs in the config, it all happens in code. Take timers
code as an example:
SEC("tracepoint/timer/timer_start")
int do_count(struct trace_event_raw_timer_start* ctx)
We use libbpf
provided SEC
macro to tell what to attach to, which in this case is timer:timer_start
tracepoint. You can use any SEC
that libbpf
provides (there are many) and it should work out of the box, including uprobe
, usdt
and fentry
(the latter currently requires a kernel patch on aarch64
).
We piggyback on libbpf
for most of the stuff with SEC
, with the only exception being perf_event
. For that we have a custom handler allowing you to set type
, config
, and frequency
of the event you want to trace. Below is type=HARDWARE
, config=PERF_COUNT_HW_CACHE_MISSES
at 1Hz from llcstat
example:
SEC("perf_event/type=0,config=3,frequency=1")
int on_cache_miss(struct bpf_perf_event_data *ctx)
With uprobe
support we also provide a way for you to run some code when you program is attached:
SEC("uprobe//proc/self/exe:post_attach_mark")
int do_init()
There's post_attach_mark()
function in ebpf_exporter
that runs immediately after all configs are attached. In bpf-jit
example we use it to initialize a metric that would otherwise require a probe to run, which might be a while.
We now allow loose program attachment. If previously all programs had to be attached successfully for ebpf_exporter
to run, now we allow failures and export a metric whether each program was attached or not. This way you can use alerting to detect when this happens, while not sacrificing unrelated configs. This is handy if your programs attach to something that might be missing from some kernels, like a static
function that is sometimes not visible. We used it in our cachestat
example.
Speaking of metrics, if you have kernel.bpf_stats_enabled
sysctl enabled, we now also report how many times each of your eBPF programs ran and how long it spent running, which might be handy if you want to get an idea of how long things take.
In code and for the debug endpoint we renamed "tables" to "maps" to match eBPF terminology. If you were using /tables
for debugging, you should switch to /maps
. Previously configs needed to specify which table
metrics came from, now it's automatically inferred from the metric name itself.
We have updated our benchmark, which now includes fentry
, so you can see how much faster it is than good old kprobe
and how much overhead you should expect in general (it's not much).
All of these changes are reflected in README
, so if you start from scratch, you shouldn't worry. If you are currently using ebpf_exporter
v1, it will take some work to upgrade. The good news is that the metrics you export do not need to change. Internally at Cloudflare we upgraded without any issues.
You may have noticed that previously ebpf_exporter
took some time to start up due to the need to compile programs. Since this is no longer the case, you should expect much faster startup times now. For complex configs like biolatency
you should also expect lower memory usage (we observed ~250MiB -> ~30MiB drop during the upgrade).
If you need some documents getting up to speed with libbpf
and CO-RE
, here are three great blog posts from libbpf
maintainer @anakryiko:
- https://nakryiko.com/posts/bpf-portability-and-co-re/
- https://nakryiko.com/posts/bcc-to-libbpf-howto-guide/
- https://nakryiko.com/posts/bpf-core-reference-guide/
We hope you'll enjoy these changes. As usual, please let us know if you run into any issues.