-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Using libzstd in a memory constrained environment
Zstandard, in typical configurations, assumes that using several MB for compression and decompression is acceptable. This page discusses how to tune Zstandard in memory constrained environments. Typically, this is in embedded code, mobile code, or cases where than can be many Zstandard compression & decompression contexts, like when using streaming compression on a server to communicate with many clients concurrently.
To determine Zstandard's total memory usage for any Zstandard context object, including ZSTD_CCtx
, ZSTD_CStream
, ZSTD_DCtx
, and ZSTD_DStream
, you can use ZSTD_sizeof_Object()
(e.g. ZSTD_sizeof_CCtx(ZSTD_CCtx* cctx)
). You must call this function after you use the context object, because it reports the current memory usage.
void doCompress(ZSTD_CCtx* cctx);
void compressAndMeasureMemory(ZSTD_CCtx* cctx)
{
doCompress(cctx);
fprintf(stderr, "Memory usage of ZSTD_CCtx is %zu\n", ZSTD_sizeof_CCtx(cctx));
}
Zstandard single-shot decompression, via ZSTD_decompress()
or ZSTD_decompressDCtx()
, use a fixed amount of memory independent of the frame being decompressed, or any parameters. The memory usage can be queried via ZSTD_sizeof_DCtx()
.
Zstandard streaming decompression needs to allocate a buffer to store the history window. This buffer is Window_Size + 2 * Block_Maximum_Size
. It also needs to allocate a buffer of Block_Maximum_Size
to buffer one compressed block.
The maximum allowed Window_Size
and Block_Maximum_Size
can be controlled by ZSTD_d_maxWindowLog
and ZSTD_d_maxBlockSize
respectively. Setting these parameters limits the frames that will be accepted by the decompressor, so the compressor must be configured to respect these limitations.
This parameter controls the maximum allowed window size that the decompressor will accept. The decoder defaults to 27, which means it will allocate up to 128 MB. The standard recommends setting this value to at least 23 for maximum compatibility.
The compressor won't generate frames with window sizes > 128MB, unless explicitly told to by setting ZSTD_c_windowLog
. If you set ZSTD_d_maxWindowLog
you must ensure that the compressor sets ZSTD_c_windowLog
to a value no greater than the selected maxWindowLog
, otherwise the decompressor may reject the compressed frame.
The Block_Maximum_Size = min(128 KB, Window_Size)
, so by setting the Window_Size
to less than 128 KB, you can also shrink the Block_Maximum_Size
.
This parameter allows the decoder to limit the Block_Maximum_Size
independently of the Window_Size
. The decompressor allocates Window_Size + 3 * Block_Maximum_Size
, so when the Window_Size
shrinks the Block_Maximum_Size
makes up a significant portion of the memory allocated.
For example, in a streaming compression use case where one client is receiving compressed data from many servers, one might choose a Window_Size = 128 KB
to balance compression ratio and memory usage. However, the decoder will still need 512 KB of memory usage. If ZSTD_d_maxBlockSize
is set to 4KB, maybe because packets are expected to be less than 4KB, then the memory usage shrinks dramatically to 140 KB.
Setting this parameter means that you are explicitly rejecting valid Zstandard frames, so you must coordinate with the compressor. The compressor must also set ZSTD_c_maxBlockSize
to a value no greater than the value of ZSTD_d_maxBlockSize
. Otherwise the compressor will almost certainly generate blocks that are larger than the maximum block size, and the decompressor will reject the frame.
All of Zstd's memory usage in single-shot mode is adjusted to the size of the data being compressed. The hashLog
and chainLog
are capped to the next power of 2 larger than the source size. If the source is smaller than 128 KB
then some internal work buffers are also shrunk proportionally.
First and foremost, higher compression levels will typically use more memory, as they map to advanced parameters that need larger tables. Generally, when tuning for memory usage, you should pick a target compression level and then tune advanced parameters to fine-tune the tradeoff if necessary. Picking a target compression level also gives you a benchmark for speed and compression ratio, so as you tune the advanced parameters, you can make sure you aren't hurting speed or compression ratio too much.
Generally, when optimizing for memory usage, you should stick to compression levels 3 and below, as they are optimized for working well with smaller memory budgets, because they intend their working memory to stay mostly in L2 cache.
Zstd allocates 4 * (1 << hashLog)
bytes for hash tables. Shrinking this value will save memory, and likely increase compression speed, at the cost of compression ratio.
Zstd allocates 4 * (1 << chainLog)
bytes for auxiliary tables. These tables are allocated and used for every strategy except ZSTD_fast
, but the usage and meaning of the table changes between strategy. Shrinking this value will save memory, and likely increase compression speed, at the cost of compression ratio.
This parameter normally does not affect memory usage. It will have a large impact on speed, and what the match finder does with its memory changes. The strategies ZSTD_greedy
, ZSTD_lazy
, and ZSTD_lazy2
use an additional 1 << hashLog
bytes. The strategies ZSTD_btopt
, and ZSTD_btultra
use additional memory.
Generally, when optimizing for memory usage, you should use ZSTD_fast
, or ZSTD_dfast
. The "flagship" levels for these strategies are 1 and 3 respectively.
See the streaming section for more details, but Zstd allocates storage proportional to the maximum block size, which is min(source size, 128KB)
for single-shot compression. It allocates approximately 4 * Block_Maximum_Size
. So shrinking this value can be significant if the ZSTD_c_hashLog
and ZSTD_c_chainLog
are small. Note that this can also hurt compression ratio and speed, as the fixed costs per block will become more significant.
All of the memory optimizations that apply to single-shot compression also apply to streaming. Additionally the ZSTD_c_windowSize
and ZSTD_c_maxBlockSize
can decrease the amount of memory the streaming compressor needs to store its history window, and compressed block buffer.
Shrinking the window size will directly shrink Zstandard's streaming memory usage, but it will likely also hurt compression ratio, because it can't look as far in the past for matches. Enforcing smaller window sizes means the decompressor will allocate less memory during decompression of the frame. The decoder can enforce strict limits with ZSTD_d_maxWindowLog
.
In addition to the benefits for single-shot compression, shrinking ZSTD_c_maxBlockSize
will reduce the amount of buffer space the streaming compressor needs. The streaming compressor allocates 2 * Block_Maximum_Size
bytes of buffer space. If you know that all compressions set ZSTD_c_maxBlockSize
, you can also reduce decompression memory usage with ZSTD_d_maxBlockSize
. But be careful, because this means the decompressor will reject compressed frames that don't set the same ZSTD_c_maxBlockSize
.
If the input buffer is guaranteed to never change & be available during the entire compression, this parameter can be set to reduce the allocated size by Window_Size + Block_Maximum_Size
, and avoid memory copying. Make sure to read the documentation carefully.
If the output buffer is guaranteed to never change, then this parameter can be set to reduce the allocated buffer size by Block_Maximum_Size
, because Zstandard can write directly into the provided output buffer and avoid copying. Make sure to read the documentation carefully.
Setting both ZSTD_c_stableInBuffer
and ZSTD_c_stableOutBuffer
makes the streaming API exactly equivalent to the single-shot function ZSTD_compress2()
.
The scenario is a case where many clients are talking to many servers, and using Zstandard streaming compression to compress the network traffic from the server to the client. For example, imagine tailing replication logs from many database shards.
If each server is expected to be connected to C
clients, then it needs to keep C
ZSTD_CCtx
context objects in memory. If each client is expected to be connected to S
servers, then it needs to keep S
ZSTD_DCtx
context objects in memory.
Generally, we expect blocks to be small, around 1KB, because every time the server wants to send a packet to the client it needs to use ZSTD_e_flush
to flush a compressed block.
Most importantly, we would set the ZSTD_c_windowLog
to an appropriate value. This is a tradeoff between compression ratio and memory usage, so you have to measure what makes the most sense for your use case. Here, we found that a window size of 256 KB was a happy medium.
Next, to shrink both compression memory usage, we can set ZSTD_c_maxBlockSize
to a smaller value. It defaults to 128 KB, which means the block-sized overheads will be similar to the window size overheads, because we're using a small window. Since we expect most of our blocks to be small anyways, we can set this to a very small value without hurting compression ratio. In this case we will choose 2 KB.
Now that we've set the ZSTD_c_maxBlockSize
, we can also set ZSTD_d_maxBlockSize
. We have to be careful here, because a consumer that sets ZSTD_d_maxBlockSize
can only consume data from a producer with ZSTD_c_maxBlockSize
no larger. In this use case, we send the log2 of the maxBlockSize
over the network as a header, and then set the ZSTD_d_maxBlockSize
accordingly. This solves the coordination issue, and allows us to reduce memory usage. We sent this in a Zstandard Skippable Frame, so that consumers that weren't aware of this header could skip it.
Finally, we will select our compression level. The memory usage of every level is already bounded by shrinking the ZSTD_c_windowLog
. We will select compression level 3, because in this case using more memory than level 1 was a worthwhile tradeoff. We left the ZSTD_c_hashLog
and ZSTD_c_chainLog
as-is, because the tradeoff of level 3 made sense.
In this case we don't need to set ZSTD_d_maxWindowLog
, because the producer is trusted to set the ZSTD_c_windowLog
appropriately.
Parameter | Value |
---|---|
ZSTD_c_windowLog | 18 |
ZSTD_c_maxBlockSize | 11 |
ZSTD_c_compressionLevel | 3 |
ZSTD_d_maxBlockSize | 11 |
To measure the memory usage, you would start your streaming compression, and during or after the operation, query ZSTD_sizeof_CCtx()
and ZSTD_sizeof_DCtx()
. This tells you exactly how much memory Zstandard currently has allocated.