As developers, we want to see how much memory is used in each piece of code of the TezEdge node, so that we can evaluate whether a particular function in code costs us too much memory. We want to minimize memory consumption, which is always beneficial for the smooth running of any software.
For this purpose, we've created a memory profiler for the TezEdge node that utilizes extended Berkeley Packet Filters (eBPF), a technology we’ve previously used in the firewall of the TezEdge node’s validation subsystem.
It track physical (residential) memory usage by the TezEdge node and it can determine the function and entire call stack where the allocation happened. The profiler does not store the history of allocations, hence it does not use disk memory. It only gives us the current moment slice, which preserves more space on the server for the node itself.
By running this software and its browser-based front end, developers can see a call tree and how many memory are allocated in the branches of the tree. They can then accurately determine where it would be worthwhile to decrease memory consumption.
The tool consists of two parts.
The first part is bpf-memprof-user
binary which has an embedded ebpf module.
It requires superuser permission. When launched, this binary loads the ebpf
module into the kernel and creates the /tmp/bpf-memprof.sock
socket. The ebpf
module tracks the exec
syscall to determine which process is the TezEdge node.
That is why bpf-memprof-user
should be running before the TezEdge node
is launched. If bpf-memprof-user
is launched when the node is already running,
it will not be able to find the node.
The ebpf module is tracking physical (residential) page allocation and deallocation, either removing or adding such pages to the IO cache. Additionally, the ebpf module unwinds the stack during each allocation event so that the profiler has call-stack virtual addresses.
The second part is the tezedge-memprof
library.
It performs the following tasks:
-
Connects to the socket and receives a stream of kernel events.
-
Monitors the
/proc/<pid>/maps
file. This file contains descriptions of each memory area on the address space of the TezEdge node. Among others, it contains the descriptions of memory areas of the executable codelight-node
binary and shared libraries used by the node. It allows translation from the virtual address of the function into filename and offset in the file where the function is. -
Loads
.symtab
and.strtab
sections fromlight-node
binary and from shared libraries. It enables the profiler to resolve function names. -
Counts allocated memory and memory used for cache at each function.
-
Serves http requests.
- Linux kernel 5.11 version or higher.
- Docker
- Docker compose
- (RECOMMENDED) Steps described in Docker Post-Installation.
-
git clone https://github.com/tezedge/tezedge-debugger.git
-
cd tezedge-debugger
-
docker-compose pull && docker-compose up -d
First two steps are clone source code from github and move to the directory.
The full source core is unneeded. You can take only docker-compose.yml
file.
Third step is running the TezEdge node along with the memory profiler and frontend.
Now you can see the result at http://localhost/#/resources/memory in your browser.
The application is distributed as a docker image
tezedge/tezedge-memprof
. The image needs to have privileged
permissions. It also needs /sys/kernel/debug
and /proc
directories mapped
from the host system. The application is serving http requests on port 17832
.
For example:
docker run --rm --privileged -it -p 17832:17832 -v /proc:/proc:rw -v /sys/kernel/debug:/sys/kernel/debug:rw tezedge/tezedge-memprof:latest
In order to determine function names, the memory profiler needs access
to light-node
and system shared libraries. The files to which the memory profiler has access to should be the same files that the Tezedge node is using. That is why the docker image
tezedge/tezedge-memprof:latest
is inherited from the tezedge/tezedge:latest
image.
However, if tezedge
is updated, but the tezedge-memprof
image is still old,
it can lead to problems. To avoid such situations, tezedge-memprof
image has
a docker client inside, and copies the light-node
binary from the current
tezedge
container.
Set the TEZEDGE_NODE_NAME
environment variable into the TezEdge node container
name and map /var/run/docker.sock
file from host to enable such behavior.
See docker-compose.yml
and memprof.sh
for details.
Return a tree-like object. Each node of the tree represents a function in some executable file.
The tree has the following structure:
-
name
-
executable
- name of the binary file (ELF), for examplelibc-2.31.so
-
offset
- offset of the function call in the binary file -
functionName
- demangled name of the function, for example<networking::p2p::peer::Peer as riker::actor::Receive<networking::p2p::peer::SendMessage>>::receive::hfe17b4d497a1a6cb
, note: rust function name is ending with hash, for examplehfe17b4d497a1a6cb
-
functionCategory
- indicates the origin of the function, can be one of the following: -
nodeRust
is a function of the TezEdge node written in Rust -
nodeCpp
is a function of the TezEdge node written in C++ -
systemLib
is a function from a system library, usually written in C, but it can also be an arbitrary language. -
value
- positive integer, number of kilobytes of total memory (ordinary + cache) allocated in this function and all functions from which this function is called -
cacheValue
- positive integer, number of kilobytes of cache allocated in this function -
frames
- list of branches of the tree, containing all functions from which this function is called, or containing all functions which are called from this function (ifreverse
is set in true).
reverse
- boolean parameter, used to request reversed tree, default value
is false
;
threshold
- integer parameter, used to filter out functions which allocate
a smaller amount of memory than some threshold value, default value is 256
.
Returns the process id of the TezEdge Node process.
Network message recorder for applications running on the Tezos protocol.
First of all, the network recorder should get the raw data from the kernel.
The network recorder uses the BPF module to intercept network-related syscalls.
It intercepts read
, recvfrom
, write
, sendto
, bind
, listen
,
connect
, accept
and close
syscalls. Those syscalls give a full picture
of network activity of the application. The BPF module configured to know where
the application which we want to record is listening incoming connection.
That is needed to determine an applications PID. It listen bind
attempts from
any PID on the given port. And once we have one, we know the PID. After that,
the BPF module intercepting other syscalls made by this PID. A single instance of the recorder
can record multiple applications simultaneously. Do not run multiple instance of
the network recorder.
Tezos nodes communicate by exchanging chunked P2P messages over the internet. Each part uses its own "blocks" of data.
Packets are used by the higher layers of TCP/IP models to transport application communication over the internet (there are more type of data blocks on lower levels of the model, like ethernet frames, but we do not work with those). The network recorder does not care about such low-level details, packets are processed by the kernel.
A binary chunk is a Tezos construct, which represents some sized binary block. Each chunk is a continuous memory, with the first two bytes representing the size of the block. Chunks are send over internet in TCP Packets, but not necessarily one chunk per packet, and not necessarily the end of the packet is the end of the chunk. The TCP segment can contain multiple chunks and it split into packets by the kernel, or network hardware which does not know nothing about Tezos chunks. So the single TCP packet can contain multiple chunk, and can contain few last bytes of some chunk and few first bytes of the next chunk. It is not easy to handle properly. We need to bufferize received data and cut chunks from the buffer.
A message is parsed representation of some node command, but to be able to send them over internet, they must first be serialized into binary blocks of data, which are then converted into Binary Chunks and finally split into packets to be sent over internet. Again, it is not necessary, that single message is split into single binary chunk. It is required to await enough chunks to deserialize message.
The primary feature of the network recorder is the ability to decrypt all messages while having access only to the single identity of the local node.
To establish encrypted connection, Tezos nodes exchange ConnectionMessages
which contain information about the nodes themselves,
including public keys, nonces, proof-of-stake and node running protocol version(s). The public key is static and is part of
a node's identity, as is proof-of-stake. Nonces are generated randomly for each connection message. After the ConnectionMessage
exchange, each node remembers the node it received and the nonce it sent, and creates the "precomputed" key (for speedups), which is
calculated from the local node's private key and remote node's public key. The nonce is a number incremented after each use.
- To encrypt a message, the node uses the nonce sent in its own
ConnectionMessage
and a precomputed key. - To decrypt a message, the node uses the received nonce and a precomputed key.
For the network recorder to decrypt a message that is coming from a remote node to the local running node, it needs to know the following:
- The local node's private key - which is part of its local identity to which the network recorder has access.
- The remote node's public key - which is part of the received
ConnectionMessage
and was captured. - The remote node's nonce - which is part of the received
ConnectionMessage
and was captured.
However, to decrypt a message sent by the local node, it would be necessary to know the private key of the remote node, to which it does not have access. Fortunately, Tezos is internally using the Curve5519 method, which allows to decrypt a message with the same keys which were used for encryption, thus the network recorder "just" needs the:
- Local node's private key - which is part of its local identity, to which the network recorder has access.
- Remote node's public key - which is part of the received
ConnectionMessage
and was captured. - Local node's nonce - which is part of the sent
ConnectionMessage
and was captured.
To capture node logs, the network recorder utilizes the "syslog" protocol (which can be easily enabled in the Docker), which, instead of printing the log into the console, wraps them into the UDP packet and sends them to the server. This should be handled by the application or the administrator of the application. The network recorder runs a syslog server inside to simply process the generated logs. This system allows us to decouple the recorder from the node, which prevents the network recorder from failing if the running node fails. This preserves all of the captured logs, which can potentially include information about the failure of the node.
Storage is based on RocksDB, utilizing custom indexes, which allows field filtering and cursor pagination.
RPC server is based on the warp crate. All endpoints are based on cursor-pagination, meaning it is simple to paginate real-time data. All data are from local storage
Endpoint for checking all P2P communication on running node. Messages are always sorted from newest to oldest.
node_name : string
- Name of the node, requiredcursor : 64bit integer value
- Cursor offset, used for easier navigating in messages. Default is the last message.limit : 64bit integer value
- Maximum number of messages returned by the RPC. Default is 100 messages.remote_addr : String representing socket address in format "<IP>:<PORT>"
- Filter message belonging to communication with given remote node.incoming : Boolean
- Filter messages by their directiontypes : comma separated list of types
- Filter messages by given typessource_type : "local" or "remote"
- Filter messages by source of the messagedirection : "forward" or "backward"
- Order of messages. Forward is from older to newer, backward is from newer to older. Default idbackward
.
/v2/p2p
- Return last 100 P2P messages/v2/p2p?cursor=100&types=connection_message,metadata
- Return connection and metadata messages skipping first 100 messages.
Endpoint for checking all captured logs on running node Messages are always sorted from newest to oldest.
node_name : string
- Name of the node, requiredcursor : 64bit integer value
- Cursor offset, used for easier navigating in messages. Default is the last message.limit : 64bit integer value
- Maximum number of messages returned by the RPC. Default is 100 messages.log_level : string
- Log level, should be on oftrace, debug, info, warn, error
timestamp : string
- Unix timestamp representing time from which the logs are shown.direction : "forward" or "backward"
- Order of messages. Forward is from older to newer, backward is from newer to older. Default idbackward
.query : string
- Full text search. When usequery
, onlylimit
is allowed, all other params are ignored. See https://docs.rs/tantivy/0.15.3/tantivy/query/struct.QueryParser.html as query language manual.
/v2/log?log_level=error
- Return all errors in last one hundred logs,
- Linux kernel 5.11 version or higher.
- Docker
- Docker compose
- (RECOMMENDED) Steps described in Docker Post-Installation.
First, you must clone this repo.
git clone https://github.com/tezedge/tezedge-debugger.git
Then change into the cloned directory
cd tezedge-debugger
The easiest way to launch the Debugger is by running it with the included docker-compose file.
docker-compose pull
docker-compose up
It is preferable to use Ubuntu 21.04 to run this software since it has kernel 5.11.0. If you are running another OS with an older kernel, you need to update the kernel.
curl
,wget
andgit
to get the codezlib
,clang
andllvm 11
to build the BPF linkerlibelf
,make
to build the memory profilerlibsodium
,gcc
,g++
,libssl
,pkg-config
to build the network recorderlibarchive-tools
,flex
,bison
to prepare the kernel code (needed for the network recorder).
In Ubuntu 20.04 or Ubuntu 20.10 or Ubuntu 21.04:
sudo apt-get update
sudo apt-get install curl wget git zlib1g-dev clang make libelf-dev libsodium-dev gcc g++ libssl-dev pkg-config libarchive-tools flex bison bc lsb-release software-properties-common
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh && sudo ./llvm.sh 11 && rm llvm.sh
export LLVM_SYS_110_PREFIX=/usr/lib/llvm-11
The Rust version should be nightly-2021-03-23 for building the crates. But internally also used nightly-2020-12-31 for building bpf module.
If you have no rustup:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable
source $HOME/.cargo/env
rustup update nightly-2020-12-31
rustup update nightly-2021-03-23
If you have rustup:
rustup update stable
rustup update nightly-2020-12-31
rustup update nightly-2021-03-23
cargo +stable install bpf-linker --git https://github.com/tezedge/bpf-linker.git --branch main
Get the code:
git clone https://github.com/tezedge/tezedge-debugger
cd tezedge-debugger
Build memory profiler:
cargo +nightly-2021-03-23 build -p bpf-memprof --release
Build network recorder:
cargo +stable install --path bpf-recorder
cargo +nightly-2021-03-23 build -p tezedge-recorder --release
cargo +nightly-2021-03-23 test -p tezedge-memprof -- history
The TezEdge node and the memory profiler should be running to do this tests.
URL=http://localhost:17832 cargo +nightly-2021-03-23 test -p tezedge-memprof -- positive compare
The TezEdge node and the network recorder should not be running to do this tests.
Make sure the database is empty rm -Rf target/debugger_db
.
Set a proper path to libtezos.so
, if you are using rustup at default path, it should be:
export LD_LIBRARY_PATH=$HOME/.cargo/git/checkouts/tezedge-????????????????/???????/tezos/sys/lib_tezos/artifacts
Run the tests:
DEBUGGER_URL=http://localhost:17732 ./tezedge-recorder/test.sh
Do not run multiple instance of the memory profiler or multiple instance of network recorder simultaneously on the same computer.
A single instance of the network recorder is able to record multiple TezEdge and/or Tezos nodes on the same computer.
The network recorder expect config.toml
file in the directory where it is running.
It contains keys:
The http_v2
is the port where the network recorder serves http requests (v2).
The [[nodes]]
section contains settings related to some TezEdge or Tezos node.
There might be multiple such sections.
-
name
the identifier if the node, it should be passed to api (v2) requests. -
http_v3
is the port where the network recorder serves http requests (v3). -
db
it is path to the database where debugger store intercepted network data. -
p2p
section contains subkeys:identity
is path toidentity.json
file andport
is the port where the node will be listening incoming p2p connections. -
log
section contains subkeyport
is the UDP port where the network recorder receives nodes logs in syslog format.
Keys p2p
and log
are optional. The recorder can work on old kernel without bpf,
but in such case it only record log, and unable to record p2p traffic.
If you run the TezEdge node in docker, set environment variable
TEZEDGE_NODE_NAME
to be equal the name of the docker container of TezEdge node.
Run the memory profiler:
sudo TEZEDGE_NODE_NAME=<name of node container> ./target/none/release/bpf-memprof-user
or
sudo ./target/none/release/bpf-memprof-user
Run the network recorder:
export LD_LIBRARY_PATH=$HOME/.cargo/git/checkouts/tezedge-????????????????/???????/tezos/sys/lib_tezos/artifacts
./target/none/release/tezedge-recorder --run-bpf