-
Notifications
You must be signed in to change notification settings - Fork 240
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: eBPF ingress/egress TC program for cilium external LB (#2710)
* tc egress + ingress bpf program for external lb dualstack svcs * changes work with ip -6 neigh add for LL * adding README and updated printk * use helper func to compare IPs * fix checksum * prep makefile changes for future image installs * remove generated files, update paths, addressing comments * remove old path * update dockerfile for bpf-tc * implement zap logging * update dockerfile * create qdisc before cilium so initcontainer can start bpf-tc to attach filters * addressing comments and change use debug macro for prints * remove checksum flag * logs to outfile * reduce image size, run nft delete in main.go, delete filters if they exist before adding on restart * rename to ipv6-hp-bpf * reorder load_bytes * delete filter by name
- Loading branch information
Showing
13 changed files
with
583 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# ipv6-hp-bpf | ||
|
||
`ipv6-hp-bpf` is a project that leverages eBPF (Extended Berkeley Packet Filter) technology for traffic control in Linux kernel. This is a POC to fix external load balancer services in cilium dualstack clusters. | ||
|
||
## Description | ||
|
||
The goal of this bpf program is to fix the issue described [here](https://github.com/cilium/cilium/issues/31326). It includes both egress and ingress TC programs. These programs are meant to replace the nftable rules since they don't work on cilium clusters. | ||
The egress bpf code converts the destination IPv6 of the packet from global unicast to link local, and ingress converts the source IPv6 from link local to global unicast. | ||
|
||
## Usage | ||
|
||
Follow the steps below to compile the program and install it onto your node: | ||
|
||
1. Use the make command to build the binary or follow the steps below. | ||
```bash | ||
make ipv6-hp-bpf-binary | ||
``` | ||
|
||
2. Copy the new binary to your node(s). | ||
|
||
3. Remove the nftable rules for ipv6 with the following commands: | ||
```bash | ||
nft delete chain ip6 azureSLBProbe postrouting | ||
nft delete chain ip6 azureSLBProbe prerouting | ||
nft -n list table ip6 azureSLBProbe | ||
``` | ||
|
||
4. Start the program with: | ||
```bash | ||
./ipv6-hp-bpf | ||
``` | ||
5. Debugging logs can be seen in the node under `/sys/kernel/debug/traceing/trace_pipe` | ||
|
||
## Manual Compilation | ||
For testing purposes you can compile the bpf program without go, and attach it to the interface yourself. This is how you would do it for egress: | ||
```bash | ||
clang -O2 -g -target bpf -c egress.c -o egress.o | ||
``` | ||
|
||
This will generate the egress.o file, which you can copy over to your cluster's node. | ||
To copy to the node you need to create a node-shell instance | ||
```bash | ||
kubectl cp egress.o nsenter-xxxxx:<path-in-node> | ||
``` | ||
|
||
Since this is for cilium clusters, cilium already creates a qdisc on eth0 of type clsact (which allows both ingress and egress filters to be attached). If cilium is not installed, you would have to create the qdisc on your own by doing the following: | ||
```bash | ||
tc qdisc add dev eth0 clsact | ||
``` | ||
|
||
## Attach the filter | ||
```bash | ||
tc filter add dev eth0 egress prio 1 bpf da obj egress.o sec classifier | ||
``` | ||
|
||
## Verify the filter is attached | ||
```bash | ||
tc filter show dev eth0 egress | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
package main | ||
|
||
import ( | ||
"bytes" | ||
"net" | ||
"os/exec" | ||
|
||
"github.com/Azure/azure-container-networking/bpf-prog/ipv6-hp-bpf/pkg/egress" | ||
"github.com/Azure/azure-container-networking/bpf-prog/ipv6-hp-bpf/pkg/ingress" | ||
"github.com/vishvananda/netlink" | ||
|
||
"github.com/cilium/ebpf/rlimit" | ||
"go.uber.org/zap" | ||
) | ||
|
||
var logger *zap.Logger | ||
|
||
func main() { | ||
// Set up logger | ||
config := zap.NewProductionConfig() | ||
config.OutputPaths = []string{"stdout", "/var/log/azure-ipv6-hp-bpf.log"} | ||
logger, _ = config.Build() | ||
|
||
// Remove resource limits for kernels <5.11. | ||
if err := rlimit.RemoveMemlock(); err != nil { | ||
logger.Error("Removing memlock", zap.Error(err)) | ||
return | ||
} | ||
|
||
// Check 'nft -n list tables ip6' to see if table exists | ||
cmd := exec.Command("nft", "-n", "list", "tables", "ip6") | ||
output, err := cmd.CombinedOutput() | ||
if err != nil { | ||
logger.Error("error running 'nft -n list tables ip6'", zap.Error(err), zap.String("output", string(output))) | ||
return | ||
} | ||
|
||
// if azureSLBProbe table exists, delete it | ||
if bytes.Contains(output, []byte("azureSLBProbe")) { | ||
cmd := exec.Command("nft", "delete", "table", "ip6", "azureSLBProbe") | ||
err = cmd.Run() | ||
if err != nil { | ||
logger.Error("failed to run 'nft delete table ip6 azureSLBProbe'", zap.Error(err)) | ||
return | ||
} | ||
} | ||
|
||
ifname := "eth0" | ||
iface, err := net.InterfaceByName(ifname) | ||
if err != nil { | ||
logger.Error("Getting interface", zap.String("interface", ifname), zap.Error(err)) | ||
} | ||
logger.Info("Interface has index", zap.String("interface", ifname), zap.Int("index", iface.Index)) | ||
|
||
// Create a qdisc filter for traffic on the interface. | ||
fq := &netlink.GenericQdisc{ | ||
QdiscAttrs: netlink.QdiscAttrs{ | ||
LinkIndex: iface.Index, | ||
Handle: netlink.MakeHandle(0xffff, 0), | ||
Parent: netlink.HANDLE_CLSACT, | ||
}, | ||
QdiscType: "clsact", | ||
} | ||
if err := netlink.QdiscReplace(fq); err != nil { | ||
logger.Error("failed setting egress qdisc", zap.Error(err)) | ||
return | ||
} | ||
|
||
// Load the compiled eBPF ELF and load it into the kernel. | ||
// Set up ingress and egress filters to attach to eth0 clsact qdisc | ||
var objsEgress egress.EgressObjects | ||
defer objsEgress.Close() | ||
if err := egress.LoadEgressObjects(&objsEgress, nil); err != nil { | ||
logger.Error("Failed to load eBPF egress objects", zap.Error(err)) | ||
} | ||
if err := egress.SetupEgressFilter(iface.Index, &objsEgress, logger); err != nil { | ||
logger.Error("Setting up egress filter", zap.Error(err)) | ||
} else { | ||
logger.Info("Successfully set egress filter on", zap.String("interface", ifname)) | ||
} | ||
|
||
var objsIngress ingress.IngressObjects | ||
if err := ingress.LoadIngressObjects(&objsIngress, nil); err != nil { | ||
logger.Error("Loading eBPF ingress objects", zap.Error(err)) | ||
} | ||
defer objsIngress.Close() | ||
if err := ingress.SetupIngressFilter(iface.Index, &objsIngress, logger); err != nil { | ||
logger.Error("Setting up ingress filter", zap.Error(err)) | ||
} else { | ||
logger.Info("Successfully set ingress filter on", zap.String("interface", ifname)) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
module github.com/Azure/azure-container-networking/bpf-prog/ipv6-hp-bpf | ||
|
||
go 1.21.6 | ||
|
||
require ( | ||
github.com/cilium/ebpf v0.15.0 | ||
github.com/vishvananda/netlink v1.1.0 | ||
go.uber.org/zap v1.27.0 | ||
) | ||
|
||
require ( | ||
github.com/vishvananda/netns v0.0.0-20191106174202-0a2b9b5464df // indirect | ||
go.uber.org/multierr v1.10.0 // indirect | ||
golang.org/x/exp v0.0.0-20230224173230-c95f2b4c22f2 // indirect | ||
golang.org/x/sys v0.15.0 // indirect | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
github.com/cilium/ebpf v0.15.0 h1:7NxJhNiBT3NG8pZJ3c+yfrVdHY8ScgKD27sScgjLMMk= | ||
github.com/cilium/ebpf v0.15.0/go.mod h1:DHp1WyrLeiBh19Cf/tfiSMhqheEiK8fXFZ4No0P1Hso= | ||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= | ||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= | ||
github.com/go-quicktest/qt v1.101.0 h1:O1K29Txy5P2OK0dGo59b7b0LR6wKfIhttaAhHUyn7eI= | ||
github.com/go-quicktest/qt v1.101.0/go.mod h1:14Bz/f7NwaXPtdYEgzsx46kqSxVwTbzVZsDC26tQJow= | ||
github.com/google/go-cmp v0.5.9 h1:O2Tfq5qg4qc4AmwVlvv0oLiVAGB7enBSJ2x2DqQFi38= | ||
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= | ||
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= | ||
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= | ||
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= | ||
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= | ||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= | ||
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= | ||
github.com/rogpeppe/go-internal v1.11.0 h1:cWPaGQEPrBb5/AsnsZesgZZ9yb1OQ+GOISoDNXVBh4M= | ||
github.com/rogpeppe/go-internal v1.11.0/go.mod h1:ddIwULY96R17DhadqLgMfk9H9tvdUzkipdSkR5nkCZA= | ||
github.com/stretchr/testify v1.8.1 h1:w7B6lhMri9wdJUVmEZPGGhZzrYTPvgJArz7wNPgYKsk= | ||
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= | ||
github.com/vishvananda/netlink v1.1.0 h1:1iyaYNBLmP6L0220aDnYQpo1QEV4t4hJ+xEEhhJH8j0= | ||
github.com/vishvananda/netlink v1.1.0/go.mod h1:cTgwzPIzzgDAYoQrMm0EdrjRUBkTqKYppBueQtXaqoE= | ||
github.com/vishvananda/netns v0.0.0-20191106174202-0a2b9b5464df h1:OviZH7qLw/7ZovXvuNyL3XQl8UFofeikI1NW1Gypu7k= | ||
github.com/vishvananda/netns v0.0.0-20191106174202-0a2b9b5464df/go.mod h1:JP3t17pCcGlemwknint6hfoeCVQrEMVwxRLRjXpq+BU= | ||
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto= | ||
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE= | ||
go.uber.org/multierr v1.10.0 h1:S0h4aNzvfcFsC3dRF1jLoaov7oRaKqRGC/pUEJ2yvPQ= | ||
go.uber.org/multierr v1.10.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN80Y= | ||
go.uber.org/zap v1.27.0 h1:aJMhYGrd5QSmlpLMr2MftRKl7t8J8PTZPA732ud/XR8= | ||
go.uber.org/zap v1.27.0/go.mod h1:GB2qFLM7cTU87MWRP2mPIjqfIDnGu+VIO4V/SdhGo2E= | ||
golang.org/x/exp v0.0.0-20230224173230-c95f2b4c22f2 h1:Jvc7gsqn21cJHCmAWx0LiimpP18LZmUxkT5Mp7EZ1mI= | ||
golang.org/x/exp v0.0.0-20230224173230-c95f2b4c22f2/go.mod h1:CxIveKay+FTh1D0yPZemJVgC/95VzuuOLq5Qi4xnoYc= | ||
golang.org/x/sys v0.0.0-20190606203320-7fc4e5ec1444/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= | ||
golang.org/x/sys v0.15.0 h1:h48lPFYpsTvQJZF4EKyI4aLHaev3CxivZmv7yZig9pc= | ||
golang.org/x/sys v0.15.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= | ||
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= | ||
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
#include <netinet/in.h> | ||
#include <stdbool.h> | ||
|
||
#define L4_HDR_OFF (ETH_HLEN + sizeof(struct ipv6hdr)) | ||
#define BPF_F_PSEUDO_HDR (1ULL << 4) | ||
|
||
static __always_inline bool compare_ipv6_addr(const struct in6_addr *addr1, const struct in6_addr *addr2) | ||
{ | ||
#pragma unroll | ||
for (int i = 0; i < sizeof(struct in6_addr); i++) | ||
{ | ||
if (addr1->s6_addr[i] != addr2->s6_addr[i]) | ||
{ | ||
return false; | ||
} | ||
} | ||
return true; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
FROM mcr.microsoft.com/oss/go/microsoft/golang:1.21 AS builder | ||
ARG VERSION | ||
ARG DEBUG | ||
ARG OS | ||
WORKDIR /bpf-prog/ipv6-hp-bpf | ||
COPY ./bpf-prog/ipv6-hp-bpf . | ||
COPY ./bpf-prog/ipv6-hp-bpf/cmd/ipv6-hp-bpf/*.go /bpf-prog/ipv6-hp-bpf/ | ||
COPY ./bpf-prog/ipv6-hp-bpf/include/helper.h /bpf-prog/ipv6-hp-bpf/include/helper.h | ||
RUN apt-get update && apt-get install -y llvm clang linux-libc-dev linux-headers-generic libbpf-dev libc6-dev gcc-multilib nftables iproute2 | ||
RUN for dir in /usr/include/x86_64-linux-gnu/*; do ln -s "$dir" /usr/include/$(basename "$dir"); done | ||
ENV C_INCLUDE_PATH=/usr/include/bpf | ||
RUN if [ "$DEBUG" = "true" ]; then echo "\n#define DEBUG" >> /bpf-prog/ipv6-hp-bpf/include/helper.h; fi | ||
RUN GOOS=$OS CGO_ENABLED=0 go generate ./... | ||
RUN GOOS=$OS CGO_ENABLED=0 go build -a -o /go/bin/ipv6-hp-bpf -trimpath -ldflags "-X main.version="$VERSION"" -gcflags="-dwarflocationlists=true" . | ||
|
||
FROM mcr.microsoft.com/cbl-mariner/distroless/minimal:2.0 | ||
COPY --from=builder /go/bin/ipv6-hp-bpf /ipv6-hp-bpf | ||
COPY --from=builder /usr/sbin/nft /usr/sbin/nft | ||
COPY --from=builder /sbin/ip /sbin/ip | ||
COPY --from=builder /lib/x86_64-linux-gnu/libnftables.so.1 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libedit.so.2 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libc.so.6 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libmnl.so.0 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libnftnl.so.11 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libxtables.so.12 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libjansson.so.4 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libgmp.so.10 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libtinfo.so.6 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libbsd.so.0 /lib/x86_64-linux-gnu/ | ||
COPY --from=builder /lib64/ld-linux-x86-64.so.2 /lib64/ | ||
COPY --from=builder /lib/x86_64-linux-gnu/libmd.so.0 /lib/x86_64-linux-gnu/ | ||
CMD ["/ipv6-hp-bpf"] |
Oops, something went wrong.