Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Add a "contains" fast-path to like_utf8_scalar #1582

Merged
merged 4 commits into from
Oct 22, 2023

Conversation

RyanMarcus
Copy link
Contributor

This PR uses memchr to add a fast path to like_utf8_scalar for when the pattern can be processed as a "contains" query. For example, if the pattern is %ABBA%, then we can check to see if each string contains ABBA instead of building a regular expression.

To measure the performance improvement from this fast path, I added a benchmark. Here are the results on my machine:

Length regex memchr
2^16 63.5 µs 0.88 µs
2^17 68.1 µs 1.04 µs
2^18 72.4 µs 1.05 µs
2^19 76.7 µs 1.11 µs
2^20 81.3 µs 1.13 µs

Since memchr does the state-of-the-art SIMD tricks (as far as I know), this technique should even be faster for "contains" queries than the glob-matching suggestion in #1295 .

* Adds dependency on `memchr` when `compute_like` is enabled
* When the scalar `rhs` in `a_like_utf8_scalar` starts and ends with a
  wildcard, and has no other wildcard characters, a fast path using
  `memchr`'s `Finder` is used.
* Added tests that trigger the fast path
* Added a benchmark for measuring the performance of the new fast path
@codecov
Copy link

codecov bot commented Oct 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9a26422) 83.38% compared to head (b8fbe12) 83.39%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1582   +/-   ##
=======================================
  Coverage   83.38%   83.39%           
=======================================
  Files         391      391           
  Lines       42983    42993   +10     
=======================================
+ Hits        35841    35853   +12     
+ Misses       7142     7140    -2     
Files Coverage Δ
src/compute/like.rs 64.73% <100.00%> (+1.95%) ⬆️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RyanMarcus RyanMarcus marked this pull request as ready for review October 21, 2023 21:20
@sundy-li sundy-li merged commit 346c866 into jorgecarleitao:main Oct 22, 2023
24 of 25 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants