This document describes the functionality of GMU's implementation of the SHA-3 family of hash functions. The implementation is written entirely in VHDL and is available on GitHub at the link listed below.
The Implementation supports all modes of the SHAKE XOF (SHAKE128 AND SHAKE256) and the two most common SHA-3 hash modes (SHA3-256 and SHA3-512). This is a high-performance design that performs one round the internal Keccak round function per cycle. Since 24 rounds are performed per permutation, one permutation requires 24 cycles. Data and commands are both loaded into the SHA3 module through a single 64-bit stream interface. Data is received from the core through another 64-bit stream interface. The core cannot be parameterized; only this input format is supported.
GitHub Link: https://github.com/GMUCERG/SHAKE
The table below briefly describes all interface signals of the SHA3 module. The mode, input length, and output length of the current operation are set by the first word of input sent through the din
port. The format of the command word and data input is specified in the following section.
The input and output data signals use a ready/valid handshake. Every transaction begins with a single command words passed through the din
port, followed by the input data also passed through the din
port. After all input data has been loaded into the module, the hash result is unloaded through the dout
port.
The user sets src_ready
low when din
holds valid data. The core asserts src_read
high when it has accepted the input data. Similarly, the user sets dst_ready
low to signal that output is ready to be received, and the core asserts dst_write
high when dout
is valid.
| Signal | Direction | Polarity Description | | -------- | -------- | -------- | -------- | | rst | input | high | Initializes the design| | clk | input | N/A | Input clock signal | | src_ready | input | Low | Assert low when din data is valid | | src_read | output | High | Asserted high to signal that din data has been accepted | | dst_ready | input | Low | Assert low to accept dout data | | dst_write | output | High | Asserted high when dout data is valid | | din[63:0] | input | N/A | Data and command input | | dout[63:0]| output | N/A | Data output |
The data below briefly describes the SHA3 module command format.
When the desired output length is unknown at the time of issues the command, set the dout size to the maximum value and then reset the core when the operation is completed.
din[63:60] | din[59:32] | din[31:0] |
---|---|---|
Operation Mode:0xC ->SHAKE128 0x8 ->SHA3-256 0xA ->SHA3-512 0xE ->SHAKE256 |
Output data size in bits | Input data size in bits |
Data is loaded in little-endian form, so if only 6 bytes
din[63:56] | din[55:48] | din[47:40] | din[39:32] |
---|---|---|---|
din[31:24] | din[23:16] | din[15:8] | din[7:0] |
---|---|---|---|
0x00 | 0x00 |
The performance of the SHA3 module depends on the rate of the SHA3 mode. The rate of the supported modes of SHA-3 are listed below.
Algorithm | Rate r (Bytes) |
---|---|
SHA3-256 | 136 |
SHA3-512 | 72 |
SHAKE128 | 136 |
SHAKE256 | 168 |
Internally, there are three buffers: input, hash, and output. The first block of input to the next operation can be loaded while the current operation completes. Similarly, while an output block is being unloaded, the permutation to calculate the next block of output is performed.
Thus, when processing multiple blocks of input and output, the inner loading and unloading cycles are masked by the permutation latency. The total latency is then the sum of the latency of the first block loaded, the number of permutations required to ingest the input data, the number of permutations required to squeeze the output data, and the number of cycles required to unload the last bytes of output data.
The input data must be shifted serially into place, so loading always requires r/io_width
cycles. The permutation requires perm_cc=26
cycles for the 24 rounds of the permutations plus one cycle to finish and one cycle to unload the state into the output buffer.
The latency for din_bytes
of input and dout_bytes
of output can be calculated as follows:
Latency For a Single Hash:
latency = r/io_width + perm_cc*ceil(din_bytes/r) + perm_cc*floor(dout_bytes/r) + (dout_bytes%r)/io_width
io_width = 8
perm_cc = 26
- Clone or download the VHDL implementation from https://github.com/GMUCERG/SHAKE
- Create a Vivado project and add all source files under the folder
src_all
as design sources - Add all sources under
tb
as simulation sources. - Update lines
29
and30
in the test filessha_tb_all.vhd
to point to the test fileskat/kat_all/ALL_ZERO_ALL_VERSION_IN.txt
andkat/kat_all/ALL_ZERO_ALL_VERSION_OUT.txt
.
You can now run the simulation. This testbed will run a number of tests for various SHA3 modes.