-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data compression as an option #58
Comments
Note that for the case of Cassandra implementation, that evolution would require to change the current mechanisms, which compresses each chunk of data separately, instead of compressing the whole blob. With the current mechanism, getting a compressed blob requires to first uncompresse each chunk, reconstitute the blob, and then compressing it. Changing this mechanism will need some caution, because we will need to be able to read data written in the "old way" too, to ensure smooth migration. |
An additional thought on that issue: In Therefore I would propose a simplification of those implementations with no automatic compression, and compression handled by the business objects: they are the ones who know whether it's relevant or not to compress their data. However, implementing this would need a lot of care to ensure non regression / smooth migration of existing systems. |
Some additional benchmarking result :
So for that use case, the system without automatic compression would be able to scale 10x better. |
So, principle has been agreed to provide the possibility to particular business objects to define if their data should be compressed or not. Steps to achieve this could be:
Under work: For example, public ImportedCaseBuilder withRawData(String format, byte[] data /* or something more streaming-friendly */); The provided data would be passed "as is" to the app storage, and not compressed. Note: this is to circumvent the data source API limitations, which is another issue. In OutputStream writeBinaryData(String nodeId, String Name, boolean mayCompress);
/*
* kind of shortcut for mayCompress = true ?
*/
OutputStream /* or Writer ? */ writeTextData(String nodeId, String Name); On the read side, we need to:
How could we achieve this ? |
Feature.
Binary data stored in AFS are compressed and uncompressed automatically by several components.
The Cassandra based implementation :
The remote implementation :
In case we want to read or write already compressed data, those steps are unnecessary and can hurt performance (and possibly memory usage).
If we could set up those components to not perform compression, it could improve performance (to be measured).
Performance optimization in a typical setup with a client connected to an AFS server, which itself relies on a Cassandra implementation of AFS.
In this kind of setup, when writing/reading data blobs, it is unnecessarily compressed and uncompressed on the server side.
Some benchmarking with JMH show that compressing a large XIIDM case (100 Mb) takes around 2s on my laptop CPU.
With the reception of around 50 cases per hour, it means 1-2 minutes of CPU time consumed for this every hour.
The text was updated successfully, but these errors were encountered: