The objective of this tutorial is to extend basic L3 forwarding with a scaled-down version of In-Band Network Telemetry (INT), which we call Multi-Hop Route Inspection (MRI).
MRI allows users to track the path and the length of queues that every packet travels through. To support this functionality, you will need to write a P4 program that appends an ID and queue length to the header stack of every packet. At the destination, the sequence of switch IDs correspond to the path, and each ID is followed by the queue length of the port at switch.
As before, we have already defined the control plane rules, so you only need to implement the data plane logic of your P4 program.
Spoiler alert: There is a reference solution in the
solution
sub-directory. Feel free to compare your implementation to the reference.
The directory with this README also contains a skeleton P4 program,
mri.p4
, which initially implements L3 forwarding. Your job (in the
next step) will be to extend it to properly prepend the MRI custom
headers.
Before that, let's compile the incomplete mri.p4
and bring up a
switch in Mininet to test its behavior.
-
In your shell, run:
make
This will:
- compile
mri.p4
, and - start a Mininet instance with three switches (
s1
,s2
,s3
) configured in a triangle. There are 5 hosts.h1
andh11
are connected tos1
.h2
andh22
are connected tos2
andh3
is connected tos3
. - The hosts are assigned IPs of
10.0.1.1
,10.0.2.2
, etc (10.0.<Switchid>.<hostID>
). - The control plane programs the P4 tables in each switch based on
sx-runtime.json
- compile
-
We want to send a low rate traffic from
h1
toh2
and a high rate iperf traffic fromh11
toh22
. The link betweens1
ands2
is common between the flows and is a bottleneck because we reduced its bandwidth to 512kbps in topology.json. Therefore, if we capture packets ath2
, we should see high queue size for that link.
-
You should now see a Mininet command prompt. Open four terminals for
h1
,h11
,h2
,h22
, respectively:mininet> xterm h1 h11 h2 h22
-
In
h2
's xterm, start the server that captures packets:./receive.py
-
in
h22
's xterm, start the iperf UDP server:iperf -s -u
-
In
h1
's xterm, send one packet per second toh2
using send.py say for 30 seconds:./send.py 10.0.2.2 "P4 is cool" 30
The message "P4 is cool" should be received in
h2
's xterm, -
In
h11
's xterm, start iperf client sending for 15 secondsiperf -c 10.0.2.22 -t 15 -u
-
At
h2
, the MRI header has no hop info (count=0
) -
type
exit
to close each xterm window
You should see the message received at host h2
, but without any
information about the path the message took. Your job is to extend
the code in mri.p4
to implement the MRI logic to record the path.
P4 programs define a packet-processing pipeline, but the rules governing packet processing are inserted into the pipeline by the control plane. When a rule matches a packet, its action is invoked with parameters supplied by the control plane as part of the rule.
In this exercise, the control plane logic has already been
implemented. As part of bringing up the Mininet instance, the
make
script will install packet-processing rules in the tables of
each switch. These are defined in the sX-runtime.json
files, where
X
corresponds to the switch number.
The mri.p4
file contains a skeleton P4 program with key pieces of
logic replaced by TODO
comments. These should guide your
implementation---replace each TODO
with logic implementing the
missing piece.
MRI will require two custom headers. The first header, mri_t
,
contains a single field count
, which indicates the number of switch
IDs that follow. The second header, switch_t
, contains switch ID and
Queue depth fields of each switch hop the packet goes through.
One of the biggest challenges in implementing MRI is handling the
recursive logic for parsing these two headers. We will use a
parser_metadata
field, remaining
, to keep track of how many
switch_t
headers we need to parse. In the parse_mri
state, this
field should be set to hdr.mri.count
. In the parse_swtrace
state,
this field should be decremented. The parse_swtrace
state will
transition to itself until remaining
is 0.
The MRI custom headers will be carried inside an IP Options
header. The IP Options header contains a field, option
, which
indicates the type of the option. We will use a special type 31 to
indicate the presence of the MRI headers.
Beyond the parser logic, you will add a table in egress, swtrace
to
store the switch ID and queue depth, and actions that increment the
count
field, and append a switch_t
header.
A complete mri.p4
will contain the following components:
- Header type definitions for Ethernet (
ethernet_t
), IPv4 (ipv4_t
), IP Options (ipv4_option_t
), MRI (mri_t
), and Switch (switch_t
). - Parsers for Ethernet, IPv4, IP Options, MRI, and Switch that will
populate
ethernet_t
,ipv4_t
,ipv4_option_t
,mri_t
, andswitch_t
. - An action to drop a packet, using
mark_to_drop()
. - An action (called
ipv4_forward
), which will:- Set the egress port for the next hop.
- Update the ethernet destination address with the address of the next hop.
- Update the ethernet source address with the address of the switch.
- Decrement the TTL.
- An ingress control that:
- Defines a table that will read an IPv4 destination address, and
invoke either
drop
oripv4_forward
. - An
apply
block that applies the table.
- Defines a table that will read an IPv4 destination address, and
invoke either
- At egress, an action (called
add_swtrace
) that will add the switch ID and queue depth. - An egress control that applies a table (
swtrace
) to store the switch ID and queue depth, and callsadd_swtrace
. - A deparser that selects the order in which fields inserted into the outgoing packet.
- A
package
instantiation supplied with the parser, control, checksum verification and recomputation and deparser.
Follow the instructions from Step 1. This time, when your message
from h1
is delivered to h2
, you should see the sequence of
switches through which the packet traveled plus the corresponding
queue depths. The expected output will look like the following,
which shows the MRI header, with a count
of 2, and switch ids
(swids
) 2 and 1. The queue depth at the common link (from s1 to
s2) is high.
got a packet
###[ Ethernet ]###
dst = 00:04:00:02:00:02
src = f2:ed:e6:df:4e:fa
type = 0x800
###[ IP ]###
version = 4L
ihl = 10L
tos = 0x0
len = 42
id = 1
flags =
frag = 0L
ttl = 62
proto = udp
chksum = 0x60c0
src = 10.0.1.1
dst = 10.0.2.2
\options \
|###[ MRI ]###
| copy_flag = 0L
| optclass = control
| option = 31L
| length = 20
| count = 2
| \swtraces \
| |###[ SwitchTrace ]###
| | swid = 2
| | qdepth = 0
| |###[ SwitchTrace ]###
| | swid = 1
| | qdepth = 17
###[ UDP ]###
sport = 1234
dport = 4321
len = 18
chksum = 0x1c7b
###[ Raw ]###
load = 'P4 is cool'
There are several ways that problems might manifest:
-
mri.p4
fails to compile. In this case,make
will report the error emitted from the compiler and stop. -
mri.p4
compiles but does not support the control plane rules in thesX-runtime.json
files thatmake
tries to install using a Python controller. In this case,make
will log the controller output in thelogs
directory. Use these error messages to fix yourmri.p4
implementation. -
mri.p4
compiles, and the control plane rules are installed, but the switch does not process packets in the desired way. Thelogs/sX.log
files contain trace messages describing how each switch processes each packet. The output is detailed and can help pinpoint logic errors in your implementation. Thebuild/<switch-name>-<interface-name>.pcap
also contains the pcap of packets on each interface. Usetcpdump -r <filename> -xxx
to print the hexdump of the packets. -
mri.p4
compiles and all rules are installed. Packets go through and the logs show that the queue length is always 0. Then either reduce the link bandwidth intopology.json
.
In the latter two cases above, make
may leave a Mininet instance
running in the background. Use the following command to clean up
these instances:
make stop
Congratulations, your implementation works! Move on to Source Routing.
The documentation for P4_16 and P4Runtime is available here
All excercises in this repository use the v1model architecture, the documentation for which is available at: