Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-126703: Add freelists for iterators and range, method and builtin_function_or_method objects #128368

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Dec 30, 2024

In this PR we add freelists for the top most allocated objects (measured using pyperformance benchmark). Some often allocated objects that have not yet been added: ints with 2 or 3 digits, exceptions (StopIteration, IndexError) and generators.

If the freelists increase performance, the PR should probably be split into multiple ones.

Microbenchmarks:

bench_list: Mean +- std dev: [main] 14.9 us +- 0.2 us -> [pr] 14.5 us +- 0.3 us: 1.03x faster
bench_int: Mean +- std dev: [main] 383 us +- 5 us -> [pr] 386 us +- 7 us: 1.01x slower
bench_float: Mean +- std dev: [main] 113 us +- 2 us -> [pr] 111 us +- 3 us: 1.02x faster
bench_builtin_or_method: Mean +- std dev: [main] 6.14 us +- 0.45 us -> [pr] 4.17 us +- 0.07 us: 1.47x faster
bench_list_iter: Mean +- std dev: [main] 141 ns +- 4 ns -> [pr] 126 ns +- 4 ns: 1.12x faster
bench_tuple_iter: Mean +- std dev: [main] 140 ns +- 7 ns -> [pr] 125 ns +- 2 ns: 1.12x faster
bench_range_iter: Mean +- std dev: [main] 144 ns +- 5 ns -> [pr] 138 ns +- 3 ns: 1.04x faster
bench_property: Mean +- std dev: [main] 2.06 us +- 0.04 us -> [pr] 2.03 us +- 0.02 us: 1.01x faster
bench_class_method: Mean +- std dev: [main] 2.30 us +- 0.02 us -> [pr] 2.32 us +- 0.05 us: 1.01x slower

Geometric mean: 1.08x faster

The list, float and int freelists are already in main, so we don't expect an improvement there. The iterator benchmarks show a modest improvement. bench_builtin_or_method shows an improvement, but is a a bit artificial benchmark.

Benchmark script
# Quick benchmark for cpython freelists

import pyperf


def bench_list(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(100))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        for ii in tpl:
            _ = [ii]
            _ = [ii, ii + 1]
            _ = [ii, ii + 1, ii]
    return pyperf.perf_counter() - t0


def collatz(a):
    while a > 1:
        if a % 2 == 0:
            a = a // 2
        else:
            a = 3 * a + 1


def bench_int(loops):
    range_it = range(loops)
    tpl = tuple(range(200, 300))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        for jj in tpl:
            collatz(jj)
    return pyperf.perf_counter() - t0


def bench_float(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(500))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        x = 0
        for ii in tpl:
            x += float(ii + 1) ** 2 - float(ii + 1) ** 2
    return pyperf.perf_counter() - t0


def bench_builtin_or_method(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    lst = []
    it = iter(set([2, 3, 4]))
    t0 = pyperf.perf_counter()
    for ii in range_it:
        for ii in tpl:
            lst.append
            it.__length_hint__
    return pyperf.perf_counter() - t0


class A:
    def __init__(self, value):
        self.value = value

    def x(self):
        return self.value

    @property
    def v(self):
        return self.value


def bench_property(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        a = A(ii)
        for ii in tpl:
            _ = a.v
    return pyperf.perf_counter() - t0


def bench_class_method(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        a = A(ii)
        for ii in tpl:
            _ = a.x()
    return pyperf.perf_counter() - t0


def bench_list_iter(loops):
    range_it = iter(range(loops))

    lst = list(range(5))
    t0 = pyperf.perf_counter()
    for ii in range_it:
        x = 0
        for ii in lst:
            x += ii
    return pyperf.perf_counter() - t0


def bench_tuple_iter(loops):
    range_it = iter(range(loops))

    tpl = tuple(range(5))
    t0 = pyperf.perf_counter()
    for ii in range_it:
        x = 0
        for ii in tpl:
            x += ii
    return pyperf.perf_counter() - t0


def bench_range_iter(loops):
    range_it = iter(range(loops))

    r = range(5)
    t0 = pyperf.perf_counter()
    for ii in range_it:
        x = 0
        for ii in r:
            x += ii
    return pyperf.perf_counter() - t0


# %timeit bench_list(1000)

if __name__ == "__main__":
    runner = pyperf.Runner()
    runner.bench_time_func("bench_list", bench_list)
    runner.bench_time_func("bench_int", bench_int)
    runner.bench_time_func("bench_float", bench_float)
    runner.bench_time_func("bench_builtin_or_method", bench_builtin_or_method)
    runner.bench_time_func("bench_list_iter", bench_list_iter)
    runner.bench_time_func("bench_tuple_iter", bench_tuple_iter)
    runner.bench_time_func("bench_range_iter", bench_range_iter)
    runner.bench_time_func("bench_property", bench_property)
    runner.bench_time_func("bench_class_method", bench_class_method)

@Fidget-Spinner
Copy link
Member

I don't think we should share the freelists for iterators. We're not using that much memory and it's really bug-prone to share them.

@eendebakpt
Copy link
Contributor Author

I don't think we should share the freelists for iterators. We're not using that much memory and it's really bug-prone to share them.

I agree with you. I am experimenting a bit to see whether it is possible at all to do this this with different types (maybe for PyType_GenericAlloc, or some size based freelist), but for the iterators I will probably split it again.

@ericsnowcurrently
Copy link
Member

@Fidget-Spinner
Copy link
Member

The results are excellent! 1% faster geomean. Great work and congrats Pieter!

@corona10
Copy link
Member

corona10 commented Jan 3, 2025

I am not sure that it's worth adding the free list every time if there is a small margin (<3-5%).
Maybe we should trade-off between complexity and maintainability. (Not a strong disagree FYI)

@corona10
Copy link
Member

corona10 commented Jan 3, 2025

pycfunctionobject / pycmethodobject / class_method / shared_iters are maybe good to be added.
But not sure about ranges / range_iters..

@Fidget-Spinner
Copy link
Member

I am not sure that it's worth adding the free list every time if there is a small margin (<3-5%).

Maybe we should trade-off between complexity and maintainability. (Not a strong disagree FYI)

Benchmark results show consistent 1% geomean speedup on pyperformance. That's pretty worth it (for comparison, the entire types optimizer in the JIT is only 1% speedup and is way more code). Though you're probably right that not all of them are worth it. I'm thinking the method and list/tuple iters are most worth it.

@eendebakpt
Copy link
Contributor Author

I made PRs for the individual components that are worthwhile (based on the stats). The ones that do not have a PR yet (because the implementation would be more complex) are generators, StopIteration (or more general exceptions) and ints of small size (e.g. 2 of 3).

I will close this PR as it is superseded by the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants