Performance optimizations #18

GromNaN · 2024-11-28T23:05:15Z

Use use function to enable compiler optimization of some native functions.
Use preg_match('/(.).*\1/', $alphabet) to validate there is no duplicate char, faster than splitting into an array
Avoid splitting strings, and access chars directly with $string[$char]
Avoid functions with callback (array_reduce, array_filter), they are slower than foreach

Benchmark

PHPBench code

composer req --dev phpbench/phpbench

In phpbench.json

{
    "$schema": "./vendor/phpbench/phpbench/phpbench.schema.json",
    "runner.bootstrap": "vendor/autoload.php",
    "runner.file_pattern": "*Bench.php",
    "runner.path": "tests",
    "runner.iterations": 3
}

In tests/SqidsBench.php

<?php

namespace Sqids\Benchmark;

use PhpBench\Attributes\ParamProviders;
use PhpBench\Attributes\Revs;
use PhpBench\Attributes\Warmup;
use Sqids\Sqids;

#[Warmup(1)]
final class SqidsBench
{
    private const IDS = [
        'SvIzsqYMyQwI3GWgJAe17URxX8V924Co0DaTZLtFjHriEn5bPhcSkfmvOslpBu' => [0, 0],
        'n3qafPOLKdfHpuNw3M61r95svbeJGk7aAEgYn4WlSjXURmF8IDqZBy0CT2VxQc' => [0, 1],
        'tryFJbWcFMiYPg8sASm51uIV93GXTnvRzyfLleh06CpodJD42B7OraKtkQNxUZ' => [0, 2],
        'eg6ql0A3XmvPoCzMlB6DraNGcWSIy5VR8iYup2Qk4tjZFKe1hbwfgHdUTsnLqE' => [0, 3],
        'rSCFlp0rB2inEljaRdxKt7FkIbODSf8wYgTsZM1HL9JzN35cyoqueUvVWCm4hX' => [0, 4],
        'sR8xjC8WQkOwo74PnglH1YFdTI0eaf56RGVSitzbjuZ3shNUXBrqLxEJyAmKv2' => [0, 5],
        'uY2MYFqCLpgx5XQcjdtZK286AwWV7IBGEfuS9yTmbJvkzoUPeYRHr4iDs3naN0' => [0, 6],
        '74dID7X28VLQhBlnGmjZrec5wTA1fqpWtK4YkaoEIM9SRNiC3gUJH0OFvsPDdy' => [0, 7],
        '30WXpesPhgKiEI5RHTY7xbB1GnytJvXOl2p0AcUjdF6waZDo9Qk8VLzMuWrqCS' => [0, 8],
        'moxr3HqLAK0GsTND6jowfZz3SUx7cQ8aC54Pl1RbIvFXmEJuBMYVeW9yrdOtin' => [0, 9],
        'JSwXFaosANEOuLlYb3jHCBpeSzx7cPRrgf1dNTZqE4nDytU09isA5ahm6kKGvM' => [1_000_000, 2_000_000],
    ];

    #[Revs(1_000)]
    #[ParamProviders('provideSqids')]
    public static function benchEncodeDecode(array $params): void
    {
        foreach (self::IDS as $id => $numbers) {
            $params[0]->encode($numbers);
            $params[0]->decode($id);
        }
    }

    public static function provideSqids(): \Generator
    {
        yield 'default' => [
            new Sqids()
        ];
        yield 'custom blocklist' => [
            new Sqids(blocklist: [
                'JSwXFaosAN',
                'OCjV9JK64o',
                'rBHf',
                '79SM',
                '7tE6',
            ])
        ];
        yield 'no blocklist' => [
            new Sqids(blocklist: [])
        ];
    }
}

Before

    benchEncodeDecode # default.............I2 - Mo2.195ms (±0.47%)
    benchEncodeDecode # custom blocklist....I2 - Mo1.099ms (±0.84%)
    benchEncodeDecode # no blocklist........I2 - Mo929.423μs (±0.87%)

After

    benchEncodeDecode # default.............I2 - Mo994.363μs (±0.35%)
    benchEncodeDecode # custom blocklist....I2 - Mo681.353μs (±0.36%)
    benchEncodeDecode # no blocklist........I2 - Mo552.545μs (±0.21%)

vinkla · 2024-11-29T06:08:49Z

This looks good to me. I'll leave it up to @4kimov to merge this who has a deeper knowledge across our language implementations.

src/Sqids.php

GromNaN

I can understand that it's a big PR and that you might have doubts about the merge. I've added comments to explain the changes. Given the performance gain, I think it's important to look into it.

GromNaN · 2024-12-20T21:27:15Z

src/Sqids.php

            throw new InvalidArgumentException('Alphabet must contain unique characters');
        }

        $minLengthLimit = 255;
-        if (
-            !is_int($minLength) ||


Type is already validated in the arg type.

GromNaN · 2024-12-20T21:28:09Z

src/Sqids.php

-        $inRangeNumbers = array_filter($numbers, fn($n) => $n >= 0 && $n <= self::maxValue());
-        if (count($inRangeNumbers) != count($numbers)) {
-            throw new InvalidArgumentException(
-                'Encoding supports numbers between 0 and ' . self::maxValue(),
-            );


Instead of creating a new array, the exception is thrown directly when there is an invalid number.

GromNaN · 2024-12-20T21:30:20Z

src/Sqids.php

-        $alphabetChars = str_split($this->alphabet);
-        foreach (str_split($id) as $c) {
-            if (!in_array($c, $alphabetChars)) {
-                return $ret;
-            }


This split operation is replaced by a more efficient regex.

GromNaN · 2024-12-20T21:32:37Z

src/Sqids.php

+        for ($i = 0, $j = strlen($alphabet) - 1; $j > 0; $i++, $j--) {
+            $r = ($i * $j + ord($alphabet[$i]) + ord($alphabet[$j])) % strlen($alphabet);
+            [$alphabet[$i], $alphabet[$r]] = [$alphabet[$r], $alphabet[$i]];


Manipulation of individual characters of the string, instead of using an array of chars.

GromNaN · 2024-12-20T21:36:41Z

src/Sqids.php

-        $id = [];
-        $chars = str_split($alphabet);
-
-        $result = $num;


Reuse and modify the variable $num instead of creating a new one.

GromNaN · 2024-12-20T21:37:21Z

src/Sqids.php

-            array_unshift($id, $chars[$this->math->intval($this->math->mod($result, count($chars)))]);
-            $result = $this->math->divide($result, count($chars));
-        } while ($this->math->greaterThan($result, 0));
+            $id = $alphabet[$this->math->intval($this->math->mod($num, strlen($alphabet)))] . $id;


Appending to the end of the string is the same as using array_unshift on an array.

4kimov · 2024-12-22T18:25:08Z

Well, to say that these optimizations are welcome would be an understatement. Thank you for taking the time!

A few basic questions as I look at this:

Are there any breaking changes in this PR or Optimize performance of blocklist filtering and checking by using Regex #17?
With new regex expressions, is there any character that a user might use in the alphabet to mess up regex matching?

GromNaN · 2024-12-22T22:35:48Z

1. Are there any breaking changes in this PR or [Optimize performance of blocklist filtering and checking by using Regex #17](https://github.com/sqids/sqids-php/pull/17)?

After multiple reviews, I don't see any breaking change in this PR.

In #17, there is something negligible with the $blocklist property, and maybe if someone customized the blocklist without adding all the same leet variations of the words.

2. With new regex expressions, is there any character that a user might use in the alphabet to mess up regex matching?

Regexes operate on bytes; unicode characters are split into bytes and not considered as a single character. But this was already the case with str_split. Unicode is already not supported.

There is no other restriction in the characters accepted by the alphabet. The alphabet is not used as part of the regex, and if it was, I would have used preg_quote to escape special chars. I escaped the blocklist works for this purpose.

GromNaN force-pushed the optim-string branch from 6331ebb to b62cb77 Compare November 29, 2024 00:18

vinkla approved these changes Nov 29, 2024

View reviewed changes

stof reviewed Nov 29, 2024

View reviewed changes

src/Sqids.php Outdated Show resolved Hide resolved

GromNaN force-pushed the optim-string branch from b62cb77 to c010630 Compare November 29, 2024 17:56

vinkla requested a review from 4kimov December 12, 2024 15:28

GromNaN force-pushed the optim-string branch from c010630 to 554aaea Compare December 20, 2024 21:24

Performance optimizations

66d5ae8

GromNaN force-pushed the optim-string branch from 554aaea to 66d5ae8 Compare December 20, 2024 21:26

GromNaN commented Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimizations #18

Performance optimizations #18

GromNaN commented Nov 28, 2024 •

edited

Loading

vinkla commented Nov 29, 2024

GromNaN left a comment

GromNaN Dec 20, 2024

GromNaN Dec 20, 2024

GromNaN Dec 20, 2024

GromNaN Dec 20, 2024

GromNaN Dec 20, 2024

GromNaN Dec 20, 2024

4kimov commented Dec 22, 2024

GromNaN commented Dec 22, 2024 •

edited

Loading

Performance optimizations #18

Are you sure you want to change the base?

Performance optimizations #18

Conversation

GromNaN commented Nov 28, 2024 • edited Loading

Benchmark

Before

After

vinkla commented Nov 29, 2024

GromNaN left a comment

Choose a reason for hiding this comment

GromNaN Dec 20, 2024

Choose a reason for hiding this comment

GromNaN Dec 20, 2024

Choose a reason for hiding this comment

GromNaN Dec 20, 2024

Choose a reason for hiding this comment

GromNaN Dec 20, 2024

Choose a reason for hiding this comment

GromNaN Dec 20, 2024

Choose a reason for hiding this comment

GromNaN Dec 20, 2024

Choose a reason for hiding this comment

4kimov commented Dec 22, 2024

GromNaN commented Dec 22, 2024 • edited Loading

GromNaN commented Nov 28, 2024 •

edited

Loading

GromNaN commented Dec 22, 2024 •

edited

Loading