Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compression ratio of levels 3 & 4 #4171

Merged
merged 5 commits into from
Oct 17, 2024
Merged

Improve compression ratio of levels 3 & 4 #4171

merged 5 commits into from
Oct 17, 2024

Conversation

Cyan4973
Copy link
Contributor

Following #4170 , there were a few ideas I wanted to test for the general compression performance of level 3, aka dfast strategy.

The one proposed in this PR is consistently positive.
It feels retrospectively obvious: select the supplemental "long" match candidate only if it's effectively longer.
On silesia.tar corpus, this modification saves ~75 KB at level 3.

And thankfully, the measured speed cost is negligible, below noise level (between 0 and -1% depending on measurements).

@Cyan4973 Cyan4973 self-assigned this Oct 16, 2024
@Cyan4973 Cyan4973 marked this pull request as ready for review October 16, 2024 19:34
Comment on lines 266 to 268
offset = (U32)(ip-matchl1);
while (((ip>anchor) & (matchl1>prefixLowest)) && (ip[-1] == matchl1[-1])) { ip--; matchl1--; mLength++; } /* catch up */
goto _match_found;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this code be merged with the code in lines 273-275?

We could set matchs0 = matchl1, and then just fallthrough to the code on line 273. Though we might want to rename some variables.

The compression ratio benefits are small but consistent, i.e. always positive.
On `silesia.tar` corpus, this modification saves ~75 KB at level 3.
The measured speed cost is negligible, i.e. below noise level, between 0 and -1%.
a margin of 4 is insufficient to guarantee compression success.
was transferred from circleci,
but was only triggered on push into dev,
i.e. after pull request is merged.
@Cyan4973 Cyan4973 merged commit b880f20 into dev Oct 17, 2024
94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants