Allow users to specify Packer's internal buf_size #547

rtobar · 2023-07-18T13:26:23Z

Giving this flexibility to users means that internal reallocations can be avoided if the buffer size is good enough, at the expense of potentially allocating more memory than needed. This, together with reusing a Packer object, means that multiple serialisations can end up requiring no memory allocations other than the initial buffer creation, which can be a big win in some situations.

The default value is still 1MB, making this backwards compatible.

Giving this flexibility to users means that internal reallocations can be avoided if the buffer size is good enough, at the expense of potentially allocating more memory than needed. This, together with reusing a Packer object, means that multiple serialisations can end up requiring no memory allocations other than the initial buffer creation, which can be a big win in some situations. The default value is still 1MB, making this backwards compatible. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>

rtobar · 2023-07-28T02:37:32Z

@methane a gentle ping to bring attention to this fairly simple PR I opened about 10 days ago, I saw that you have been contributing the most lately to this file so I imagine you are a good person to review it.

methane · 2023-07-28T03:13:21Z

a gentle ping to bring attention to this fairly simple PR

Adding public API is not "simple". I need to consider many aspects of it.

If you want to help review, please show a benchmark result that demonstraits how reallocation is important.

This, together with reusing a Packer object, means that multiple serialisations can end up requiring no memory allocations other than the initial buffer creation, which can be a big win in some situations.

Reusing Packer objects makes also reduce buffer size increase even without this change. So it doesn't make a big win.

rtobar · 2023-07-28T04:03:04Z

@methane thanks for the fast answer, much appreciated.

Adding public API is not "simple". I need to consider many aspects of it.

True, I think my previous statement was an oversimplification. I was mostly hoping that, since it was a well-constrained and backwards-compatible change, it wouldn't be too much of a burden to include it. Of course you will have a better view of what else adding this would entail.

Reusing Packer objects makes also reduce buffer size increase even without this change. So it doesn't make a big win.

True as well, I was thinking this would be more of a win for users of msgpack.packb, although when combined would give you a zero-allocation scenario.

If you want to help review, please show a benchmark result that demonstraits how reallocation is important.

This is a smaller version of the benchmark I was running. We are using xarrays, but you can simplify it further if you simply use a numpy array. The version here doesn't include the buf_size=... argument, which I did set to 200MB when testing. This was running on Linux x64, Python 3.10/3.11. Results with 3.11 basically don't have the big drop at ~64 MB, I'm assuming something changed in Python's memory allocators in 3.11 that removes that cliff. If you open the plots next to each other you'll see (noise aside) that packb has a slightly better performance when being able to allocate only once. This is using Timer.autorange for timings, so the best/worst-case scenarios are probably a bit more extreme.

import msgpack
import msgpack_numpy as mnp
import numpy as np
import xarray as xa
import timeit
import functools
import seaborn as sns
import pandas as pd
import matplotlib.pyplot

to_dict = lambda x: x.to_dict(data="array")
mser = lambda x: msgpack.packb(x, default=mnp.encode)
packer = msgpack.Packer(default=mnp.encode, autoreset=False)
def mser_packer(x):
    packer.reset()
    packer.pack(x)
    return packer

all_benchmarks = {
    "write_only": {
        'Dataset -> dict + msgpack.packb': [to_dict, mser],
        'Dataset -> dict + msgpack.Packer': [to_dict, mser_packer],
    }
}

def benchmark_for_sizes(functions, nitems):
    results = []
    for nitem in nitems:
        arr = np.random.rand(nitem)
        xarr = xa.Dataset({"x": arr, "y": arr + 30})
        size = xarr.nbytes
        print(f"  {nitem=}, {size=}")
        timer = timeit.Timer('functools.reduce(lambda v, f: f(v), functions, xarr)', setup="import functools", globals=locals())
        n_executions, total_duration = timer.autorange()
        duration = total_duration / n_executions
        results.append((size, duration))
    return results


def run_benchmarks(nitems, benchmarks):
    results = {}
    for name, functions in benchmarks.items():
        print(f"Benchmarking {name}")
        results[name] = benchmark_for_sizes(functions, nitems)
    flat_results = list((name, size, duration) for name, values in results.items() for size, duration in values)
    df = pd.DataFrame(flat_results, columns=("Benchmark", "Size", "Duration"))
    df['Size [MB]'] = df['Size'] / 1024 / 1024
    df['Speed [MB/s]'] = df['Size'] / 1024 / 1024 / df["Duration"]
    return df

def run_all():
    sns.set_theme()
    nitems = list(range(10000000, 2000000, -120000))
    dfs = []
    for group, benchmarks in all_benchmarks.items():
        df = run_benchmarks(nitems, benchmarks)
        df["Group"] = group
        dfs.append(df)

    df = pd.concat(dfs)
    sns.relplot(data=df, x='Size [MB]', y='Speed [MB/s]', kind='line', hue='Benchmark', col='Group')
    matplotlib.pyplot.savefig(f'benchmark_results.png')

if __name__ == '__main__':
    run_all()

Python 3.10: no_bufsize v/s bufsize

Python 3.11: no_bufsize v/s bufsize

rtobar · 2023-08-03T03:10:16Z

@methane: a gentle ping to learn if you have any further thoughts on whether you'd be willing to accept this change or not, given the evidence above, and the hopefully small maintenance burden?

methane · 2023-08-03T06:02:55Z

Thank you for detailed report.
I am tending toward incorporating this feature, but will not do so now.
I will make final decision when I start developing v1.1.

methane · 2023-08-03T06:05:24Z

There is no schedule for v1.1 yet.
At least one more release (v1.0.6) will be released when Python 3.12 become RC.

rtobar · 2023-08-03T06:24:58Z

Thanks @methane, much appreciated

methane · 2024-05-05T15:50:05Z

committed in #604.

methane closed this May 5, 2024

rtobar deleted the bufsize-argument branch May 6, 2024 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to specify Packer's internal buf_size #547

Allow users to specify Packer's internal buf_size #547

rtobar commented Jul 18, 2023

rtobar commented Jul 28, 2023

methane commented Jul 28, 2023

rtobar commented Jul 28, 2023

rtobar commented Aug 3, 2023

methane commented Aug 3, 2023

methane commented Aug 3, 2023

rtobar commented Aug 3, 2023

methane commented May 5, 2024

Allow users to specify Packer's internal buf_size #547

Allow users to specify Packer's internal buf_size #547

Conversation

rtobar commented Jul 18, 2023

rtobar commented Jul 28, 2023

methane commented Jul 28, 2023

rtobar commented Jul 28, 2023

rtobar commented Aug 3, 2023

methane commented Aug 3, 2023

methane commented Aug 3, 2023

rtobar commented Aug 3, 2023

methane commented May 5, 2024