-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-126024: optimize UTF-8 decoder for short non-ASCII string #126025
base: main
Are you sure you want to change the base?
Conversation
5344340
to
9b47c2b
Compare
orjson's implementation is still faster. |
Comparing to DuckDB's decoder.
When benchmarking short ASCII, performance is unstable because unicode_dealloc is slower than decoding. speed is vary on where the object is allocated. |
800452a
to
b0ce85c
Compare
b0ce85c
to
37715b6
Compare
This reverts commit c47d574.
This is tree I played microbenchmarks. |
orjson's benchmark_load result:
When seeing 0003 vs 0004 vs 0005 on twitter.json benchmark, this PR makes PyString_FromStringAndSize from 19% slower to 12% slower. |
This optimization works only for strict error handler, because other error handler may remove or replace invalid UTF-8 sequence.
Benchmark
code
Result (wit
--enable-optimizations --with-lto
):