-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-124502: Optimize unicode_eq() #125070
gh-124502: Optimize unicode_eq() #125070
Conversation
vstinner
commented
Oct 7, 2024
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Replace unicode_compare_eq() with unicode_eq().
- Replace _PyUnicode_EQ() calls with _PyUnicode_Equal().
- Remove _PyUnicode_EQ().
- Issue: [C API] Add PyUnicode_Equal() function #124502
* Replace unicode_compare_eq() with unicode_eq(). * Replace _PyUnicode_EQ() calls with _PyUnicode_Equal(). * Remove _PyUnicode_EQ().
Microbenchmark on getting a dictionary key:
Result:
Result with CPU isolation:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what lies in my git stash
.
According to your benchmarks, unicode_compare_eq()
was slightly faster than unicode_eq()
in general case. But these functions were optimized for different cases. unicode_eq()
is called when two strings has the same cache. Realistically, it means that they have the same content but different identity.
Just to be sure, could you please make benchmarks for different strings with the same content (you can create a different string by s.encode().decode()
)?
|
Correct. This PR copies faster unicode_compare_eq() code into unicode_eq(). In the same PR, I also remove unicode_compare_eq() since I don't think that it's useful to have two functions doing the same thing with the same code. |
Benchmark using different keys:
Result:
Hum, it's not easy to measure these functions. |
It's too confusing to optimize unicode_eq() and remove _PyUnicode_EQ() at the same time. I will do in two PRs instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, don't close. This PR LGTM, and it is pretty straightforward.
I only wanted to know whether replacing the unique_eq()
implementation with the unique_compare_eq()
implementation has any effect. So far the difference is smaller than the precision of your benchmarks. It is okay. It is a safe change.
Follow-up PR: #125105 |