Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-124502: Optimize unicode_eq() #125070

Closed
wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 7, 2024

  • Replace unicode_compare_eq() with unicode_eq().
  • Replace _PyUnicode_EQ() calls with _PyUnicode_Equal().
  • Remove _PyUnicode_EQ().

* Replace unicode_compare_eq() with unicode_eq().
* Replace _PyUnicode_EQ() calls with _PyUnicode_Equal().
* Remove _PyUnicode_EQ().
@vstinner
Copy link
Member Author

vstinner commented Oct 7, 2024

cc @serhiy-storchaka

@vstinner
Copy link
Member Author

vstinner commented Oct 7, 2024

Microbenchmark on getting a dictionary key:

python -m pyperf timeit -s 'd={"python": 1}' 'd["python"]'

Result:

Mean +- std dev: [ref] 17.9 ns +- 0.5 ns -> [optim] 16.9 ns +- 0.4 ns: 1.06x faster

Result with CPU isolation:

Mean +- std dev: [ref] 33.2 ns +- 0.3 ns -> [optim] 32.5 ns +- 0.2 ns: 1.02x faster

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what lies in my git stash.

According to your benchmarks, unicode_compare_eq() was slightly faster than unicode_eq() in general case. But these functions were optimized for different cases. unicode_eq() is called when two strings has the same cache. Realistically, it means that they have the same content but different identity.

Just to be sure, could you please make benchmarks for different strings with the same content (you can create a different string by s.encode().decode())?

@serhiy-storchaka
Copy link
Member

Microbenchmark on getting a dictionary key:

unicode_eq() should not be called here, because this is the same string. "python" is interned. Try to create a different string with the same content. Test also strings with other kinds.

@vstinner
Copy link
Member Author

vstinner commented Oct 7, 2024

According to your benchmarks, unicode_compare_eq() was slightly faster than unicode_eq() in general case.

Correct. This PR copies faster unicode_compare_eq() code into unicode_eq().

In the same PR, I also remove unicode_compare_eq() since I don't think that it's useful to have two functions doing the same thing with the same code.

@vstinner
Copy link
Member Author

vstinner commented Oct 7, 2024

Benchmark using different keys:

python -m pyperf timeit -s 'd={"python": 1}; key="python".encode().decode()' 'd[key]'

Result:

$ python3 -m pyperf compare_to ref.json optim.json 
Benchmark hidden because not significant (1): timeit

Hum, it's not easy to measure these functions.

@vstinner
Copy link
Member Author

vstinner commented Oct 8, 2024

It's too confusing to optimize unicode_eq() and remove _PyUnicode_EQ() at the same time. I will do in two PRs instead.

@vstinner vstinner closed this Oct 8, 2024
@vstinner vstinner deleted the remove_unicode_eq branch October 8, 2024 12:04
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, don't close. This PR LGTM, and it is pretty straightforward.

I only wanted to know whether replacing the unique_eq() implementation with the unique_compare_eq() implementation has any effect. So far the difference is smaller than the precision of your benchmarks. It is okay. It is a safe change.

@vstinner
Copy link
Member Author

vstinner commented Oct 8, 2024

Follow-up PR: #125105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants