- Automatic On-Chip Instrumentation
In a SmartHLS project there are now three levels of debug and verification:
- Software-only - Compile the C++ code and run on the host machine (e.g. x86).
- Co-simulation - Use RTL simulation to confirm the results match the software-only run.
- On-Chip - Add probes to the design to see data once the FPGA is programmed (this example).
The SmartHLS Automatic On-Chip Instrumentation feature streamlines on-chip debugging and verification by automatically adding probes to ports and FIFOs in the generated Verilog code, eliminating the need for manual instrumentation. This feature enables developers to:
- Monitor input and output data of HLS modules through port instrumentation
- Track data flow between subfunctions via FIFO instrumentation
- Detect critical issues like FIFO overflows, FIFO underflows (not enough data) and pipeline bubbles
- Optimize FIFO depths to avoid overprovisioning of LSRAM resources
To minimize area overhead, users can control instrumentation scope through configurable log levels that specify which class of signals to monitor.
IMPORTANT: This is an advanced example. It is assumed that you have completed the previous training modules available in the examples repository, and have experience with on-chip debugging using ModelSim, Synopsys Identify, and command-line operations.
Before beginning this tutorial, you should install the following software:
-
Libero® SoC 2024.2 or later (Download Page). SmartHLS™ is packaged with Libero
-
The following hardware is required:
- PolarFire® SoC FPGA Icicle Kit. Please follow this link to set up your Icicle Kit and make sure Linux boots-up and that the board has an IP network address assigned to it.
- This example can be used in a local or remote configuration depending on where the Icicle Kit board is connected to via JTAG. The JTAG host is the machine that is connected to the board and the build host is the machine where the project is compiled. See the following diagram:
-
Add the following environment variables, and adjust as necessary for your setup. You will need to set these up for every one of the terminals we launch in the tutorial below.
-
On Linux:
export SHLS_ROOT_DIR="<SMARTHLS_INSTALLATION_DIRECTORY>/SmartHLS" export PATH="$SHLS_ROOT_DIR/dependencies/python:$PATH" export BOARD_IP="<YOUR ICICLE KIT BOARD IP HERE>" export JTAG_HOST="<YOUR JTAG HOST IP HERE>" # For local JTAG debug, use 127.0.0.1 export PROGRAMMER_ID="<YOUR PROGRAMMER ID HERE>" # Available from FPExpress
-
On Windows PowerShell please either use forward slashes (/) or double back slashes (\\):
$env:SHLS_ROOT_DIR = "<SMARTHLS_INSTALLATION_DIRECTORY>/SmartHLS" $env:PATH = "$env:SHLS_ROOT_DIR/dependencies/python;$env:PATH" $env:BOARD_IP="<YOUR ICICLE KIT BOARD IP HERE>" $env:JTAG_HOST="<YOUR JTAG HOST IP HERE>" # For local JTAG debug, use 127.0.0.1 $env:PROGRAMMER_ID="<YOUR PROGRAMMER ID HERE>" # Available from FPExpress
- KNOWN ISSUE: In Windows, SmartHLS includes Python 3 and the binary name is
python.exe
, however, a TCL script in the SmartHLS 2024.2 installation is explicitly callingpython3
, which does not exist. To be able to run the instrumentation example in Windows, just copy the file as follows:
cp "$env:SHLS_ROOT_DIR/dependencies/python/python.exe" "$env:SHLS_ROOT_DIR/dependencies/python/python3.exe"
- KNOWN ISSUE: In Windows, SmartHLS includes Python 3 and the binary name is
-
NOTE: The JTAG_HOST
variable can be set to 127.0.0.1
if the machine that the board is connected to is the same as the machine where the project is being compiled and debugged.
We have created a simple, yet general example that describes a streaming design pattern to showcase how the SmartHLS Automatic On-Chip Instrumentation feature works. The following is a block diagram of the example architecture where the red dots represent the instrumentation probes that are automatically inserted:
A typical pipeline starts with a data producer()
, and the data goes through a series of processing functions, in this case, the fifoToFifo()
function just forwards the data, but in a real design it can be any stream processing function. Finally, the data is read by the consumer()
. The producer()
will start generating a continuous data sequence when the RISC-V CPU sends the go = 1
signal and will stop when go = 0
.
Each instance of the FifotoFifo()
function and the consumer()
function have a delay
argument to artificially create a backpressure in the dataflow and cause the FIFOs to fill up to a level proportional to the delays. The delays for the pipeline are passed as command-line arguments to the RISC-V binary (.elf
file). Once the RISC-V binary is running, the user can press CTRL + C
to send the go = 0
signal to the HLS module to stop the execution of the program.
With this example, users can see on-chip when the RISC-V CPU writes the go
and delay
arguments to the HLS module by looking at the AXI target
ports, or can see how data flows through the FIFOs and how they fill-up.
From the user perspective, the flow is simple compared to manual instrumentation: just enable the instrumentation flow and adjust the instrumentation parameters as needed. The rest is handled by SmartHLS.
Here is a high-level diagram of the Automatic On-Chip Instrumentation flow:
The process follows these steps:
-
Initial Phase:
- User writes C++ code.
- SmartHLS converts the C++ code to Verilog and generates an initial instrumentation specification (
instrument_config.json
) - Users can optionally customize the instrumentation configuration
-
High-Level Instrumentation Phase:
- SmartHLS integrates the generated Verilog module into the Icicle Kit Libero reference design and runs RTL synthesis. The resulting netlist is not yet instrumented.
- SmartHLS High-Level Instrumentor then extracts information from the uninstrumented netlist and from the
instrument_config.json
to generate TCL scripts for Synplify to actually instrument the design.
-
Implementation Phase:
- Synplify & Libero apply the generated instrumentation constraints (
identify.idc
file) to generate an instrumented netlist. - Libero completes Place-and-Route and generates the bitstream
- Synplify & Libero apply the generated instrumentation constraints (
-
Debugging and Monitoring Phases:
Two operational modes are available to capture data from the instrumented design:
- Debugging Mode: Users can set triggers for specific conditions and use Identify to capture data. More interactive.
- Monitoring Mode: Automatic periodic data capture, always trigger without conditions.
-
Visualization Phase:
- The Update Wave Viewer script processes the captured data (
.vcd
files) and convert it to viewer-specific formats (e.g., .wlf for ModelSim) - SmartHLS High-Level Instrumentor generates TCL scripts for optimal signal display in ModelSim:
- Groups signals by top-level
- Configures hex notation for address/data buses
- Displays FIFO occupancy levels as analog waveforms
- The Update Wave Viewer script processes the captured data (
First, to enable the Automatic On-Chip Instrumentation feature in the project, the Makefile
contains the following line:
HLS_INSTRUMENT_ENABLE=1
In a new terminal, remove stale files by running
shls clean
Then run the following command to generate a file called instrument_conf.json
.
shls -a instrument_init
This command will automatically run shls hw
first to convert the C++ code to Verilog,
since it is a pre-requisite for generating the instrument_conf.json
file. This is how instrument_conf.json
will look like:
{
"modules": {
"hlsModule": {
"log_level": "2",
"fifo_log_level": "0"
}
},
"dashboard": {
"max_iterations": -1,
"show_markers": 1,
"monitoring_mode": 0,
"waveform_period": "10"
},
"iice_options": {
"sample_buffer_depth": 1024,
"iice_name": ""
}
}
The hlsModule
is the same name of the top-level function as described in the main.cpp
file:
void hlsModule(volatile unsigned char& go,
unsigned long long int delay1,
unsigned long long int delay2,
unsigned long long int delay3,
unsigned long long int delay4) {
#pragma HLS function dataflow top
...
}
A full explanation of the parameters of instrument_conf.json
is located in the User Guide.
NOTE: Make sure to clean your project and re-run shls -a instrument_init
if you modify the top-level modules of your design, for example, if you want to add a new top-level function.
Now, let's change the log levels related to hlsModule()
. A lower log level means fewer signals will be instrumented, which in turn saves resources. The same property applies to the FIFO log level. Let's change log_level
to 3, and the fifo_log_level
to 3.
"hlsModule": {
"log_level": "3",
"fifo_log_level": "3"
}
For the sake of this example, we will only demonstrate log levels 3 and 3 (respectively), but a description of each log level is located in the User Guide.
Next, run Synthesis and Place-and-Route. This can be done with the following command:
shls -a soc_accel_proj_pnr
Now program the FPGA with the instrumented bitstream file (hls_output\soc\designer\MPFS_ICICLE_KIT_BASE_DESIGN\Icicle_SoC.job
). You can use the command line to do this (please make sure you have declared the PROGRAMMER_ID
environment variable):
shls soc_accel_proj_program
Alternatively, you can also use FlashPro Express. If you do, please make sure you close FPExpress after flashing the bitstream, as it may interfere with the debugging process.
At this point the FPGA has been programmed with the instrumented design. Now let's compile the software.
You can now cross-compile the main()
program for the RISC-V CPU by typing:
shls -a soc_sw_compile_accel
Then copy the binary (.elf
file) to the board:
On Linux:
scp hls_output/auto_instrument.accel.elf root@$BOARD_IP:./
On Windows Powershell
scp hls_output/auto_instrument.accel.elf root@$env:BOARD_IP:./
Do NOT run the auto_instrument.accel.elf
program on-board yet. Let's first arm the trigger in Identify in the next section.
To start debugging, we first need to connect to the Icicle Kit board via JTAG. On the JTAG_HOST
machine (the one where your board is connected to), launch a new terminal and start the Actel JTAG server. You can choose any unoccupied port, we'll use 57123
, but please remember which port you use.
acteljtag -p 57123
NOTE: In Windows, if you see an error upon running this command, you may have to use the fully qualified PATH of acteljtag
. You can find this in Windows PowerShell using
Get-Command acteljtag
NOTE: Keep the acteljtag
server terminal open as occasionally it may get disconnected and may need to be started again.
Now open an interactive shell for Identify Debugger:
On Linux, run
identify_debugger_shell -licensetype identdebugger_actel -shell hls_output/soc/synthesis/MPFS_ICICLE_KIT_BASE_DESIGN_syn.prj
On Windows, run
identify_debugger_console -licensetype identdebugger_actel hls_output/soc/synthesis/MPFS_ICICLE_KIT_BASE_DESIGN_syn.prj
And then you can connect to the JTAG server using the following commands (make sure to use the same port number as before):
server set -addr $::env(JTAG_HOST) -port 57123 -cabletype Microsemi_BuiltinJTAG
server start
com cableoption Microsemi_BuiltinJTAG_port $::env(PROGRAMMER_ID)
com check
Next, we'll pick a signal to trigger on. As a result from the instrumentation process, you should see that the Identify Design Constraints
file (hls_output/soc/synthesis/identify.idc
) has been automatically generated. It contains all the signals that are being instrumented. For example, you may notice the line:
{/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/empty}
This is a signal that indicates whether or not fifo1
is empty. We can then set a trigger when this signal transitions from high to low, which would indicate that fifo1
is not empty. When this happens, the trigger will inform Identify to start recording sample data every cycle up until the sample_buffer
is full, which then can be visualized in ModelSim. To do this, run the following command in the Identify shell:
watch enable -language verilog {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/empty} {1'b1} {1'b0}
Finally, arm the trigger:
run -iice {IICE_auto_instrument}
Now wait until you see the following in the shell:
DI179 IICE 'IICE_auto_instrument' configured. Waiting for trigger.
% Running.
Running...
This will wait until inputFifo
's empty
signal becomes low. But to get it to become low, we need to run the auto_instrument.accel.elf
binary that was compiled earlier on-board.
Now, to run the design on the board, open an ssh
session to the Icicle Kit board:
ssh root@$BOARD_IP
Now run the software binary that was copied earlier on-board. The four arguments are the delays for the FIFOs. This will cause fifo1
to become non-empty which will, in turn, trigger in Identify.
./auto_instrument.accel.elf 0 0 0 0
You should see in the Identify shell that the trigger has been activated. Let's now write the captured data to a .vcd
file. First press ENTER, and then execute:
write vcd "IICE_auto_instrument.vcd" -iice IICE_auto_instrument
Now in a new terminal window, launch ModelSim by running
vsim -do hls_output/scripts/instrument/vsim_keyboard_shortcut
Now, open the ModelSim window and press Ctrl + R to refresh.
You should see the signals for FIFOs arranged and grouped in an intuitive manner. You can expand the User_Defined_FIFOs
group to see the signals for the FIFOs in the design. For example, here's the grouped signals for fifo1
(after toggling on leaf names):
Let's take a look at the empty
and write_data
signals for fifo1
, and compare these signals to the corresponding counterparts for fifo2
, fifo3
, and fifo4
. Since we ran the HLS module with all delays set to 0
, we expect to see that the time between two consecutive FIFOs becoming non-empty is very small. Place a cursor at falling edge of the empty
signal for all the FIFOs. You may have to zoom in a little to get it right. For clarity, we'll remove all the other signals for now, so that we can see the four empty
signals on top of each other.
Notice that the delay between the falling edges is 60ns. Since a clock cycle is 10ns, 60ns is 6 clock cycles. The reason for this offset in delay is due to some of the control logic in the generated Verilog code. In general, expect
- A 6-cycle delay when the delay is 0
- An (9 + N)-cycle delay when the delay is N, for some positive integer N.
To confirm this, kill the executable running on the board by pressing Ctrl + C
, arm the trigger again, and then rerun the program with all delays set to 1
.
In the Identify shell:
run -iice {IICE_auto_instrument}
Then, in the board SSH terminal:
./auto_instrument.accel.elf 1 1 1 1
Then, refresh the ModelSim waveform (press Ctrl + R
), and replace the cursors as before, on the falling edge of the empty signal. You should see the difference is 9 + 1 = 10 clock cycles
Exercise: Try with different delays to make sure the design works.
When the SIGINT signal (Ctrl + C
) is sent to the executable running on the board, the producer()
function should write 0x0FF
(wordplay for "OFF") to fifo1
and terminate. When 0x0FF
is seen by the other functions in the pipeline, they too will terminate, effectively ceasing operation of the pipeline completely. Let's confirm this is actually the case.
In the Identify shell, remove the existing trigger.
watch disable {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/empty}
Then, trigger on 0x0FF
being written to fifo1
:
watch enable -language verilog {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/write_data} {32'h0FF}
Now run the debugger
run -iice {IICE_auto_instrument}
And then, kill the running executable on the board by pressing Ctrl + C
.
Then, in the Identify shell, the trigger should have been activated. Write to the .vcd
file by pressing ENTER then running
write vcd "IICE_auto_instrument.vcd" -iice IICE_auto_instrument
Then, refresh ModelSim (Ctrl + R
).
You should see the FIFOs becoming empty at the tail-end of the waves.
Let's check what was written to fifo1
just before it became empty. Look at the last falling edge of write_en
, and place a cursor just before it (this is the location of the last write). You will have to zoom in.
As expected, we see that the last written word is 0x0FF
(you can also see that the word written before is 0x11FE<SOME NUMBER>
, as we expect!).
A very important signal we have not yet mentioned is the usedw
signal. This signal counts the number of elements in the FIFO at any given point in time. In this design, we expect that the occupancy of each FIFO grows during the initial delay stage, and then remains constant until the executable running on-board is killed, in which case it will become empty. We will examine the usedw
signal for when the FIFO occupancy is growing (when the HLS modules is first started), and leave the examination of the usedw
signal when the FIFO size is shrinking as an exercise to the reader.
First, remove the trigger set in the previous section.
watch disable {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/write_data}
Then, add a trigger for fifo1
becoming non-empty
watch enable -language verilog {/FIC_0_PERIPHERALS_1/hlsModule_top_0/hlsModule_inst/hlsModule_BB_0_fifo1_inst/genblk1/fwft_fifo_bram_inst/empty} {1'b1} {1'b0}
Then, run the debugger
run -iice {IICE_auto_instrument}
Finally, run the executable on the board with all delays set to 12
, then execute
Then, in the Identify shell, write to the .vcd
file by pressing ENTER then running
write vcd "IICE_auto_instrument.vcd" -iice IICE_auto_instrument
in the Identify shell to write to the .vcd
, and then refresh ModelSim (Ctrl + R
).
First, let's format the usedw
signal for fifo1
in ModelSim. Right-click it and hover over Format
, then select Analog (automatic)
. Then, right-click it again and hover over Radix
, then select Decimal
.
Now you can place a cursor anywhere in the waveform and see the occupancy level of fifo1
. You will see that the occupancy at the pipeline's stable state is 20. This is because the initial delay of 12 plus the control logic delay of the submodules. In these 21 clock cycles, 20 elements were able to be written before the fifotoFifo()
function started forwarding data from fifo1
to fifo2
.
Exercise: Play around with the delays and check that the occupancies of the other FIFOs makes sense intuitively.
Now close ModelSim and the Identify Debugger console.
So far, anytime we wanted to test a new delay, we would have to manually arm the Identify trigger, write to the .vcd
file, and refresh ModelSim. This is an interactive approach where a new capture would overwrite the previous waveform where developers can inspect and experiment with different trigger conditions.
In contrast, in monitoring mode the goal is to have more long-term setting to see how signals change over more prolonged periods of time. For example, to see if FIFOs slowly start growing and eventually overflow. In monitoring mode the waveforms in ModelSim will update automatically with every new capture of data from the instrumented design and the trigger will happen periodically and with no trigger condition.
Monitoring mode consists of two parts:
- a monitoring process
- a visualizing process
- with a waveform (using ModelSim)
- with a bar plot (using Python)
To enable Monitoring Mode, change
set monitoring_mode 0
to
set monitoring_mode 1
in hls_output/scripts/update_vcd.tcl
. This indicates to the waveform updating scripts that when we get new data from the debugger, we don't want to refresh the waveform, but rather want to concatenate the new data to the end of the existing waveform.
Then, open a new terminal and start a monitoring process that periodically captures the data:
identify_debugger_shell -licensetype identdebugger_actel ./hls_output/scripts/instrument/monitor.tcl $PROGRAMMER_ID
Finally, open Modelsim in a new terminal for visualization:
vsim -do hls_output/scripts/instrument/update_vcd.tcl
This will launch ModelSim again, but the waveform will update continuously (no need to press Ctrl+R
to refresh) as soon as Identify provides new captured data periodically.
Now close ModelSim and the Identify Debugger console.
When writing C++ to design a hardware module, it may not be clear at first how deep your FIFOs need to be. On the one hand, if they are too shallow, then you will have a lot of data backup (backpressure). And, if they are too deep, then you may be overprovisioning and wasting area.
The FIFO Monitoring Dashboard aims to show developers, in nearly real-time, how filled up their FIFOs are getting as their program executes. It does so in an intuitive manner using a bar graph, where each bar represents a FIFO.
Start the monitoring loop that will generate the periodic captures:
identify_debugger_shell -licensetype identdebugger_actel hls_output/scripts/instrument/monitor.tcl $PROGRAMMER ID
Finally, open a new terminal and launch the FIFO Monitoring Dashboard:
shls -s instrument_monitor_fifos
Now, when you run the auto_instrument.accel.elf
executable on-board, you should see the bar graph changing according to how full the FIFOs are. Try playing around with different delays and see how this affects the bar graphs. The occupancies should match the values you see for the usedw
signal in ModelSim, for the FIFO dashboard is simply a python-based visualization of this signal!
The bar graph should periodically change as it receives data from the monitoring process. The timestamp at the top of the plot indicates the time the plotted data was created by the monitoring process.
You might notice a few shallow FIFOs on the left of the screen with very long names. These are infrastructure FIFOs, and are part of SmartHLS's AXI hardware design IP. The rightmost four FIFOs are the user-defined FIFOs, and are the ones described in the C++ code and the ones you'll want to pay attention to.
NOTE: Please be advised that the FIFO Dashboard feature is currently in an experimental phase. Please use it with caution and anticipate potential minor issues
Here are some examples of the bar plot. You should confirm these make sense intuitively.
-
When the executable is not running on-board, all FIFOs are empty:
-
When the delays are 20, 40, 80, and 160, respectively:
auto_instrument.accel.elf 20 40 80 160
-
When the delays are 220, 20, 150, and 80, respectively:
auto_instrument.accel.elf 220 20 150 80
In this tutorial, we only demonstrated how to set triggers and configure the client-server connection using the Identify shell. However, all of this can also be done with the GUI. You can launch the GUI by opening a new terminal and running
identify_debugger -licensetype identdebugger_actel hls_output/soc/synthesis/MPFS_ICICLE_KIT_BASE_DESIGN_syn.prj
Then, you can configure the client-server connection by clicking Debugger > Setup debugger...
dialog, and then visiting the Communications
tab to connect to the JTAG server.
To trigger and run the debugger, you should find the correct signals you wish to trigger on, right-click it, hover over Triggering
, and customize your trigger appropriately. Then, hit the big Run
button at the upper-left side of the window. For example, here's how you can trigger on fifo1
becoming non-empty and then run the debugger: