gh-124502: Optimize unicode_eq() #125070

vstinner · 2024-10-07T21:31:00Z

Replace unicode_compare_eq() with unicode_eq().
Replace _PyUnicode_EQ() calls with _PyUnicode_Equal().
Remove _PyUnicode_EQ().

Issue: [C API] Add PyUnicode_Equal() function #124502

* Replace unicode_compare_eq() with unicode_eq(). * Replace _PyUnicode_EQ() calls with _PyUnicode_Equal(). * Remove _PyUnicode_EQ().

vstinner · 2024-10-07T21:31:15Z

cc @serhiy-storchaka

vstinner · 2024-10-07T21:38:18Z

Microbenchmark on getting a dictionary key:

python -m pyperf timeit -s 'd={"python": 1}' 'd["python"]'

Result:

Mean +- std dev: [ref] 17.9 ns +- 0.5 ns -> [optim] 16.9 ns +- 0.4 ns: 1.06x faster

Result with CPU isolation:

Mean +- std dev: [ref] 33.2 ns +- 0.3 ns -> [optim] 32.5 ns +- 0.2 ns: 1.02x faster

serhiy-storchaka

This is exactly what lies in my git stash.

According to your benchmarks, unicode_compare_eq() was slightly faster than unicode_eq() in general case. But these functions were optimized for different cases. unicode_eq() is called when two strings has the same cache. Realistically, it means that they have the same content but different identity.

Just to be sure, could you please make benchmarks for different strings with the same content (you can create a different string by s.encode().decode())?

serhiy-storchaka · 2024-10-07T21:48:44Z

Microbenchmark on getting a dictionary key:

unicode_eq() should not be called here, because this is the same string. "python" is interned. Try to create a different string with the same content. Test also strings with other kinds.

vstinner · 2024-10-07T21:50:56Z

According to your benchmarks, unicode_compare_eq() was slightly faster than unicode_eq() in general case.

Correct. This PR copies faster unicode_compare_eq() code into unicode_eq().

In the same PR, I also remove unicode_compare_eq() since I don't think that it's useful to have two functions doing the same thing with the same code.

vstinner · 2024-10-07T21:54:55Z

Benchmark using different keys:

python -m pyperf timeit -s 'd={"python": 1}; key="python".encode().decode()' 'd[key]'

Result:

$ python3 -m pyperf compare_to ref.json optim.json 
Benchmark hidden because not significant (1): timeit

Hum, it's not easy to measure these functions.

vstinner · 2024-10-08T12:04:12Z

It's too confusing to optimize unicode_eq() and remove _PyUnicode_EQ() at the same time. I will do in two PRs instead.

serhiy-storchaka

Wait, don't close. This PR LGTM, and it is pretty straightforward.

I only wanted to know whether replacing the unique_eq() implementation with the unique_compare_eq() implementation has any effect. So far the difference is smaller than the precision of your benchmarks. It is okay. It is a safe change.

vstinner · 2024-10-08T12:41:12Z

Follow-up PR: #125105

pythongh-124502: Optimize unicode_eq()

c415345

* Replace unicode_compare_eq() with unicode_eq(). * Replace _PyUnicode_EQ() calls with _PyUnicode_Equal(). * Remove _PyUnicode_EQ().

vstinner added the skip news label Oct 7, 2024

vstinner requested a review from rhettinger as a code owner October 7, 2024 21:31

bedevere-app bot added the awaiting core review label Oct 7, 2024

bedevere-app bot mentioned this pull request Oct 7, 2024

[C API] Add PyUnicode_Equal() function #124502

Closed

serhiy-storchaka reviewed Oct 7, 2024

View reviewed changes

vstinner closed this Oct 8, 2024

vstinner deleted the remove_unicode_eq branch October 8, 2024 12:04

serhiy-storchaka reviewed Oct 8, 2024

View reviewed changes

serhiy-storchaka mentioned this pull request Oct 8, 2024

gh-124502: Optimize unicode_eq() #125105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-124502: Optimize unicode_eq() #125070

gh-124502: Optimize unicode_eq() #125070

vstinner commented Oct 7, 2024 •

edited by bedevere-app bot

Loading

vstinner commented Oct 7, 2024

vstinner commented Oct 7, 2024 •

edited

Loading

serhiy-storchaka left a comment

serhiy-storchaka commented Oct 7, 2024

vstinner commented Oct 7, 2024

vstinner commented Oct 7, 2024

vstinner commented Oct 8, 2024

serhiy-storchaka left a comment

vstinner commented Oct 8, 2024

gh-124502: Optimize unicode_eq() #125070

gh-124502: Optimize unicode_eq() #125070

Conversation

vstinner commented Oct 7, 2024 • edited by bedevere-app bot Loading

vstinner commented Oct 7, 2024

vstinner commented Oct 7, 2024 • edited Loading

serhiy-storchaka left a comment

Choose a reason for hiding this comment

serhiy-storchaka commented Oct 7, 2024

vstinner commented Oct 7, 2024

vstinner commented Oct 7, 2024

vstinner commented Oct 8, 2024

serhiy-storchaka left a comment

Choose a reason for hiding this comment

vstinner commented Oct 8, 2024

vstinner commented Oct 7, 2024 •

edited by bedevere-app bot

Loading

vstinner commented Oct 7, 2024 •

edited

Loading