Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify stdlib code by using itertools.batched() #126317

Open
lgeiger opened this issue Nov 1, 2024 · 3 comments
Open

Simplify stdlib code by using itertools.batched() #126317

lgeiger opened this issue Nov 1, 2024 · 3 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@lgeiger
Copy link

lgeiger commented Nov 1, 2024

#98364 introduced itertools.batched() but since it's pretty new it's still rarely used inside the standard library.

pickle.py includes a custom version multiple times which all could be replaced by itertools.batched() for improved readability.
This probably has a negligible performance impact (at the very least it won't be slower) but should make the code easier to understand.

Let me know if you're interested in a contribution for this or whether you'd rather not have the extra churn for a minor change like this.

Linked PRs

@picnixz picnixz added type-feature A feature request or enhancement stdlib Python modules in the Lib dir labels Nov 2, 2024
@picnixz
Copy link
Contributor

picnixz commented Nov 2, 2024

I'm not sure we want to refactor pickle.py, especially not the pure Python implementation. But I'll let this decision fall to @serhiy-storchaka.

@serhiy-storchaka
Copy link
Member

There is no great need in such change, but it makes the code slightly smaller, so I have no objections. Please show the results of microbenchmarks for pickling large list, dict and set. Even a small speedup would be an additional argument for this change. A slowdown would be a sign that there is something wrong with itertools.batched().

@dongwooklee96
Copy link

dongwooklee96 commented Nov 2, 2024

bench.py

import pickle
import pyperf

large_list = list(range(10**6))  # 1,000,000 list
large_dict = {str(i): i for i in range(10**6)}  # 1,000,000 dict
large_set = set(range(10**6))  # 1,000,000 set

def pickle_save(data, filename):
    with open(filename, "wb") as f:
        pickle.dump(data, f)

def pickle_load(filename):
    with open(filename, "rb") as f:
        return pickle.load(f)

def run_benchmarks():
    runner = pyperf.Runner()

    # Pickle dump test
    runner.bench_func("pickle_save_large_list", pickle_save, large_list, "large_list.pkl")
    runner.bench_func("pickle_save_large_dict", pickle_save, large_dict, "large_dict.pkl")
    runner.bench_func("pickle_save_large_set", pickle_save, large_set, "large_set.pkl")

    # Pickle load test
    runner.bench_func("pickle_load_large_list", pickle_load, "large_list.pkl")
    runner.bench_func("pickle_load_large_dict", pickle_load, "large_dict.pkl")
    runner.bench_func("pickle_load_large_set", pickle_load, "large_set.pkl")

if __name__ == "__main__":
    run_benchmarks()

result

Benchmark hidden because not significant (6): pickle_save_large_list, pickle_save_large_dict, pickle_save_large_set, pickle_load_large_list, pickle_load_large_dict, pickle_load_large_set

The result of running the benchmark using pyperf.

I'm sorry, my results are a little different from what I've done before, so I'm guessing there's a difference in configure. If you get a different result, I'd appreciate it if you could let me know.

dongwooklee96 added a commit to dongwooklee96/cpython that referenced this issue Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: No status
Development

No branches or pull requests

4 participants