Skip to content

Commit

Permalink
Merge branch 'master' into s2-plus-plus
Browse files Browse the repository at this point in the history
  • Loading branch information
klauspost authored Nov 8, 2023
2 parents 51032dc + dc4151f commit 9c714b3
Show file tree
Hide file tree
Showing 57 changed files with 2,401 additions and 304 deletions.
24 changes: 13 additions & 11 deletions .github/workflows/go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
build:
strategy:
matrix:
go-version: [1.18.x, 1.19.x, 1.20.x]
go-version: [1.19.x, 1.20.x, 1.21.x]
os: [ubuntu-latest, macos-latest, windows-latest]
env:
CGO_ENABLED: 0
Expand Down Expand Up @@ -54,7 +54,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.20.x
go-version: 1.21.x

- name: Checkout code
uses: actions/checkout@v2
Expand All @@ -76,7 +76,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.20.x
go-version: 1.21.x

- name: Checkout code
uses: actions/checkout@v2
Expand All @@ -90,14 +90,11 @@ jobs:
- name: Build s2c
run: go build github.com/klauspost/compress/s2/cmd/s2c && go build github.com/klauspost/compress/s2/cmd/s2d&&./s2c -verify s2c &&./s2d s2c.s2&&rm ./s2c&&rm s2d&&rm s2c.s2

- name: install garble
run: go install mvdan.cc/garble@v0.9.2

- name: goreleaser deprecation
run: curl -sfL https://git.io/goreleaser | VERSION=v1.9.2 sh -s -- check
run: curl -sfL https://git.io/goreleaser | VERSION=v1.20.0 sh -s -- check

- name: goreleaser snapshot
run: curl -sL https://git.io/goreleaser | VERSION=v1.9.2 sh -s -- --snapshot --skip-publish --rm-dist
run: curl -sL https://git.io/goreleaser | VERSION=v1.20.0 sh -s -- --snapshot --skip-publish --rm-dist

- name: Test S2 GOAMD64 v3
env:
Expand All @@ -119,7 +116,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.20.x
go-version: 1.21.x

- name: Checkout code
uses: actions/checkout@v2
Expand Down Expand Up @@ -150,7 +147,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.20.x
go-version: 1.21.x

- name: Checkout code
uses: actions/checkout@v2
Expand Down Expand Up @@ -190,7 +187,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.20.x
go-version: 1.21.x

- name: Checkout code
uses: actions/checkout@v2
Expand All @@ -204,3 +201,8 @@ jobs:
- name: zip/FuzzReader
run: go test -run=none -fuzz=FuzzReader -fuzztime=500000x -test.fuzzminimizetime=10ms ./zip/.

- name: fse/FuzzCompress
run: go test -run=none -fuzz=FuzzCompress -fuzztime=1000000x -test.fuzzminimizetime=10ms ./fse/.

- name: fse/FuzzDecompress
run: go test -run=none -fuzz=FuzzDecompress -fuzztime=1000000x -test.fuzzminimizetime=10ms ./fse/.
7 changes: 2 additions & 5 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,12 @@ jobs:
name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.20.x
-
name: install garble
run: go install mvdan.cc/garble@v0.9.2
go-version: 1.21.x
-
name: Run GoReleaser
uses: goreleaser/goreleaser-action@v2
with:
version: 1.9.2
version: 1.20.0
args: release --rm-dist
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand Down
20 changes: 3 additions & 17 deletions .goreleaser.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
before:
hooks:
- ./gen.sh
- go install mvdan.cc/garble@v0.9.3
- go install mvdan.cc/garble@v0.10.1

builds:
-
Expand Down Expand Up @@ -92,16 +92,7 @@ builds:
archives:
-
id: s2-binaries
name_template: "s2-{{ .Os }}_{{ .Arch }}_{{ .Version }}"
replacements:
aix: AIX
darwin: OSX
linux: Linux
windows: Windows
386: i386
amd64: x86_64
freebsd: FreeBSD
netbsd: NetBSD
name_template: "s2-{{ .Os }}_{{ .Arch }}{{ if .Arm }}v{{ .Arm }}{{ end }}"
format_overrides:
- goos: windows
format: zip
Expand All @@ -125,7 +116,7 @@ changelog:

nfpms:
-
file_name_template: "s2_package_{{ .Version }}_{{ .Os }}_{{ .Arch }}"
file_name_template: "s2_package__{{ .Os }}_{{ .Arch }}{{ if .Arm }}v{{ .Arm }}{{ end }}"
vendor: Klaus Post
homepage: https://github.com/klauspost/compress
maintainer: Klaus Post <klauspost@gmail.com>
Expand All @@ -134,8 +125,3 @@ nfpms:
formats:
- deb
- rpm
replacements:
darwin: Darwin
linux: Linux
freebsd: FreeBSD
amd64: x86_64
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,22 @@ This package provides various compression algorithms.

# changelog

* Oct 22nd, 2023 - [v1.17.2](https://github.com/klauspost/compress/releases/tag/v1.17.2)
* zstd: Fix rare *CORRUPTION* output in "best" mode. See https://github.com/klauspost/compress/pull/876

* Oct 14th, 2023 - [v1.17.1](https://github.com/klauspost/compress/releases/tag/v1.17.1)
* s2: Fix S2 "best" dictionary wrong encoding by @klauspost in https://github.com/klauspost/compress/pull/871
* flate: Reduce allocations in decompressor and minor code improvements by @fakefloordiv in https://github.com/klauspost/compress/pull/869
* s2: Fix EstimateBlockSize on 6&7 length input by @klauspost in https://github.com/klauspost/compress/pull/867

* Sept 19th, 2023 - [v1.17.0](https://github.com/klauspost/compress/releases/tag/v1.17.0)
* Add experimental dictionary builder https://github.com/klauspost/compress/pull/853
* Add xerial snappy read/writer https://github.com/klauspost/compress/pull/838
* flate: Add limited window compression https://github.com/klauspost/compress/pull/843
* s2: Do 2 overlapping match checks https://github.com/klauspost/compress/pull/839
* flate: Add amd64 assembly matchlen https://github.com/klauspost/compress/pull/837
* gzip: Copy bufio.Reader on Reset by @thatguystone in https://github.com/klauspost/compress/pull/860

* July 1st, 2023 - [v1.16.7](https://github.com/klauspost/compress/releases/tag/v1.16.7)
* zstd: Fix default level first dictionary encode https://github.com/klauspost/compress/pull/829
* s2: add GetBufferCapacity() method by @GiedriusS in https://github.com/klauspost/compress/pull/832
Expand Down Expand Up @@ -645,6 +661,8 @@ Here are other packages of good quality and pure Go (no cgo wrappers or autoconv
* [github.com/dsnet/compress](https://github.com/dsnet/compress) - brotli decompression, bzip2 writer.
* [github.com/ronanh/intcomp](https://github.com/ronanh/intcomp) - Integer compression.
* [github.com/spenczar/fpc](https://github.com/spenczar/fpc) - Float compression.
* [github.com/minio/zipindex](https://github.com/minio/zipindex) - External ZIP directory index.
* [github.com/ybirader/pzip](https://github.com/ybirader/pzip) - Fast concurrent zip archiver and extractor.

# license

Expand Down
108 changes: 108 additions & 0 deletions dict/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Dictionary builder

This is an *experimental* dictionary builder for Zstandard, S2, LZ4, deflate and more.

This diverges from the Zstandard dictionary builder, and may have some failure scenarios for very small or uniform inputs.

Dictionaries returned should all be valid, but if very little data is supplied, it may not be able to generate a dictionary.

With a large, diverse sample set, it will generate a dictionary that can compete with the Zstandard dictionary builder,
but for very similar data it will not be able to generate a dictionary that is as good.

Feedback is welcome.

## Usage

First of all a collection of *samples* must be collected.

These samples should be representative of the input data and should not contain any complete duplicates.

Only the *beginning* of the samples is important, the rest can be truncated.
Beyond something like 64KB the input is not important anymore.
The commandline tool can do this truncation for you.

## Command line

To install the command line tool run:

```
$ go install github.com/klaupost/compress/dict/cmd/builddict@latest
```

Collect the samples in a directory, for example `samples/`.

Then run the command line tool. Basic usage is just to pass the directory with the samples:

```
$ builddict samples/
```

This will build a Zstandard dictionary and write it to `dictionary.bin` in the current folder.

The dictionary can be used with the Zstandard command line tool:

```
$ zstd -D dictionary.bin input
```

### Options

The command line tool has a few options:

- `-format`. Output type. "zstd" "s2" or "raw". Default "zstd".

Output a dictionary in Zstandard format, S2 format or raw bytes.
The raw bytes can be used with Deflate, LZ4, etc.

- `-hash` Hash bytes match length. Minimum match length. Must be 4-8 (inclusive) Default 6.

The hash bytes are used to define the shortest matches to look for.
Shorter matches can generate a more fractured dictionary with less compression, but can for certain inputs be better.
Usually lengths around 6-8 are best.

- `-len` Specify custom output size. Default 114688.
- `-max` Max input length to index per input file. Default 32768. All inputs are truncated to this.
- `-o` Output name. Default `dictionary.bin`.
- `-q` Do not print progress
- `-dictID` zstd dictionary ID. 0 will be random. Default 0.
- `-zcompat` Generate dictionary compatible with zstd 1.5.5 and older. Default false.
- `-zlevel` Zstandard compression level.

The Zstandard compression level to use when compressing the samples.
The dictionary will be built using the specified encoder level,
which will reflect speed and make the dictionary tailored for that level.
Default will use level 4 (best).

Valid values are 1-4, where 1 = fastest, 2 = default, 3 = better, 4 = best.

## Library

The `github.com/klaupost/compress/dict` package can be used to build dictionaries in code.
The caller must supply a collection of (pre-truncated) samples, and the options to use.
The options largely correspond to the command line options.

```Go
package main

import (
"github.com/klaupost/compress/dict"
"github.com/klauspost/compress/zstd"
)

func main() {
var samples [][]byte

// ... Fill samples with representative data.

dict, err := dict.BuildZstdDict(samples, dict.Options{
HashLen: 6,
MaxDictSize: 114688,
ZstdDictID: 0, // Random
ZstdCompat: false,
ZstdLevel: zstd.SpeedBestCompression,
})
// ... Handle error, etc.
}
```

There are similar functions for S2 and raw dictionaries (`BuildS2Dict` and `BuildRawDict`).
Loading

0 comments on commit 9c714b3

Please sign in to comment.