-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Snappy compressor with Snappy compatible version of S2 algorithm #276
Conversation
@sylwiaszunejko, let's move compressors out to a separate package with separate |
That would be easy to do if we would want to add new compressor type that uses S2, we could have it in separate module like |
We better leave snappy where it is and as default, but on next minor release, updating golang we can move it out and make s2 default. Btw, on that topic, we need to have benchmark tests for both compressors. |
So your recommendation is to not merge this PR right now? Or to change it to add new type of compressor in separate module that uses S2 and leave snappy as it is? Because if I understand @mykaul correctly in the issue discussion, he wanted to edit existing SnappyCompressor not to add a new one
I agree |
compressor_test.go
Outdated
@@ -14,7 +15,7 @@ func TestSnappyCompressor(t *testing.T) { | |||
} | |||
|
|||
str := "My Test String" | |||
//Test Encoding | |||
//Test Encoding with Snappy | |||
expected := snappy.Encode(nil, []byte(str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this test does. It still tests with the original Snappy compressor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted some test to check if it is indeed compatible but this test is not a good way of doing that, and I agree that S2 probably has such basic tests
compressor_test.go
Outdated
expected = s2.EncodeSnappy(nil, []byte(str)) | ||
if res, err := c.Encode([]byte(str)); err != nil { | ||
t.Fatalf("failed to encode '%v' with error %v", str, err) | ||
} else if bytes.Compare(expected, res) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we verifying that S2 can indeed produce a Snappy compatible output, or a byte to byte compatible output? It seems the latter, which isn't clear is what S2 can do actually. I think what you need to test is can you decode it.
(But I also think it's not needed. S2 probably has such basic tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test doesn't make sens I removed it
We cannot add a new type that Scylla server won't understand. And if we wish to have snappy and S2-snappy - I don't fully see the point. They are either compatible and therefore replaceable, or not.
We must. There's absolutely no value other than improved performance. So if it doesn't buy us anything, I don't see a reason to do it. |
|
30da1aa
to
f5f6baa
Compare
@mykaul @dkropachev That's probably not the best solution but I changed s2 version from 1.17.10 to 1.17.9 and now running |
Great news, sure. Let's have a perfromance tests to confirm it is better and switch to it. |
klauspost/compress#1008 probably updated Golang. We should also update, but certainly not as part of this PR. |
I tried to test the performance with this code: package main
import (
"crypto/rand"
"fmt"
"math/big"
"time"
"github.com/gocql/gocql"
)
// Generate random text payload of a given size in bytes
func generateTextPayload(size int) []byte {
letters := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
payload := make([]byte, size)
charsetLen := big.NewInt(int64(len(letters)))
// Fill the payload with random characters from the charset
for i := 0; i < size; i++ {
randomIndex, err := rand.Int(rand.Reader, charsetLen)
if err != nil {
return nil
}
payload[i] = letters[randomIndex.Int64()]
}
return payload
}
// Generate random binary payload of a given size in bytes
func generateBinaryPayload(size int) []byte {
payload := make([]byte, size)
rand.Read(payload) // Fill the slice with random binary data
return payload
}
// Configure the cluster and set the compressor
func setupClusterWithCompressor(compressor gocql.Compressor) *gocql.ClusterConfig {
cluster := gocql.NewCluster("127.0.0.2")
cluster.Keyspace = "example"
cluster.Compressor = compressor
return cluster
}
// Insert a payload into the database
func insertPayload(session *gocql.Session, id gocql.UUID, payload []byte) error {
start := time.Now()
err := session.Query(`INSERT INTO test_payload (id, data) VALUES (?, ?)`, id, payload).Exec()
elapsed := time.Since(start)
fmt.Printf("Insert Time: %s\n", elapsed)
return err
}
// Retrieve a payload from the database
func retrievePayload(session *gocql.Session, id gocql.UUID) ([]byte, error) {
var payload []byte
start := time.Now()
err := session.Query(`SELECT data FROM test_payload WHERE id = ?`, id).Scan(&payload)
elapsed := time.Since(start)
fmt.Printf("Retrieve Time: %s\n", elapsed)
return payload, err
}
func testCompressorPerformance(session *gocql.Session, compressorName string, payload []byte) {
id := gocql.TimeUUID()
// Insert payload
err := insertPayload(session, id, payload)
if err != nil {
fmt.Printf("Error inserting payload: %v", err)
}
// Retrieve payload
retrievedPayload, err := retrievePayload(session, id)
if err != nil {
fmt.Printf("Error retrieving payload: %v", err)
}
fmt.Printf("Compressor: %s, Payload Size: %d, Retrieved Size: %d\n", compressorName, len(payload), len(retrievedPayload))
}
func main() {
cluster := setupClusterWithCompressor(gocql.SnappyCompressor{})
session, err := cluster.CreateSession()
if err != nil {
fmt.Printf("Unable to connect to cluster: %v\n", err)
}
binaryPayload := generateBinaryPayload(1024)
testCompressorPerformance(session, "snappy-s2", binaryPayload)
binaryPayload2 := generateBinaryPayload(1024 * 50)
testCompressorPerformance(session, "snappy-s2", binaryPayload2)
binaryPayload3 := generateBinaryPayload(1024 * 500)
testCompressorPerformance(session, "snappy-s2", binaryPayload3)
textPayload := generateTextPayload(1024)
testCompressorPerformance(session, "snappy-s2", textPayload)
textPayload2 := generateTextPayload(1024 * 50)
testCompressorPerformance(session, "snappy-s2", textPayload2)
textPayload3 := generateTextPayload(1024 * 500)
testCompressorPerformance(session, "snappy-s2", textPayload3)
session.Close()
cluster = setupClusterWithCompressor(gocql.SnappyCompressor_old{})
session, err = cluster.CreateSession()
if err != nil {
fmt.Printf("Unable to connect to cluster: %v\n", err)
}
binaryPayload = generateBinaryPayload(1024)
testCompressorPerformance(session, "snappy", binaryPayload)
binaryPayload2 = generateBinaryPayload(1024 * 50)
testCompressorPerformance(session, "snappy", binaryPayload2)
binaryPayload3 = generateBinaryPayload(1024 * 500)
testCompressorPerformance(session, "snappy", binaryPayload3)
textPayload = generateTextPayload(1024)
testCompressorPerformance(session, "snappy", textPayload)
textPayload2 = generateTextPayload(1024 * 50)
testCompressorPerformance(session, "snappy", textPayload2)
textPayload3 = generateTextPayload(1024 * 500)
testCompressorPerformance(session, "snappy", textPayload3)
} Unfortunately there is no significant difference between these two, for some payloads one is better, for some the other, example results:
Maybe you have any suggestion around testing, maybe I could do something different/better? |
However, if there's no real change, I would be happy to move - just because I feel the original Snappy is not being actively worked on, and S2 seem to have more movement. |
not sure how to check that
I have checked memory consumption and it looks similar for both compressors, I am not sure how to compare cpu consumption, tried to use cpu profiling but with not much success yet. |
size of the packets.
Yes, that may be more challenging, if we don't measure it well. |
Let's merge #289 and then you can use it to benchmark both compressors by running:
|
Snappy:
S2:
It looks like S2 is a little bit better |
f5f6baa
to
f29c45a
Compare
Thanks, just to double check, can you please run |
Snappy:
S2:
|
So S2 has slightly worse compression rate and slightly better performance? |
Exactly. |
OK. The only reason I'd go with it as that it seems to be in active development vs. the original snappy Golang code. |
Fixes: #142