-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Zstd compression #560
Add Zstd compression #560
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #560 +/- ##
=======================================
Coverage 86.94% 86.94%
=======================================
Files 31 31
Lines 4311 4321 +10
=======================================
+ Hits 3748 3757 +9
- Misses 563 564 +1 ☔ View full report in Codecov by Sentry. |
I'm not an expert on this package, but you seem to be correct. Line 160 in ccd60eb
|
Manual test: julia> using JLD2, CodecZstd
julia> A = zeros(1000,1000);
julia> A[1] = rand()
0.44335952762563824
julia> sizeof(A)/1000^2 # 8 MB array
8.0
julia> save("test_without_compression.jld2", "A", A)
julia> save("test_with_compression.jld2", "A", A, compress=ZstdCompressor())
julia> A == load("test_with_compression.jld2", "A")
true
julia> A == load("test_without_compression.jld2", "A")
true and I have two files on disk then
one uncompressed of about 8MB and one compressed to 1KB |
You should test if you can load the file via HDF5.jl.
|
Okay that's not working correctly julia> h5open("test_with_compression.jld2") do h5f
h5f["A"][]
end
1000×1000 Matrix{Float64}:
0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7.41892e-310 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.95e-321 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7.41892e-310 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.42e-321 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7.41892e-310 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3.6e-322 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
⋮ ⋱ ⋮
0.0 4.0e-323 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 5.17085e170 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 9.56977e-315 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.11254e-308 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.11255e-308 0.0 0.0 0.0 0.0 0.0 0.0 0.0 (it should be A[1] == 0.4436 otherwise zero), even though the file without zstd compression is loaded correctly. Why is that? edit: The same works perfectly fine with |
So the problem is that we are using https://github.com/HDFGroup/hdf5_plugins/blob/master/ZSTD/src/H5Zzstd.c |
OK, I'm going to start working on a ZstdFrameCompressor / ZstdFrameDecompressor . I'm not sure if that is the final name yet, but it's basically the NotStream compressor / decompressor. I'm not completely sure how to make that work in the TranscodingStream API, but we'll see. |
Awesome, let me know if I can help! |
@mkitti I've written already the tests for |
I've set the compressor here from |
I think it should still be |
Note that CodecZstd.jl contains |
Hi, Locally, a round trip of
with
or
|
This should work now: julia> using H5Zzstd, HDF5
julia> h5open("test_with_compression.jld2") do h5f
h5f["A"][]
end
julia> using HDF5_jll
julia> run(`$(HDF5_jll.h5ls()) -va test_with_compression.jld2`);
Opened "test_with_compression.jld2" with sec2 driver.
A Dataset {1000/1000, 1000/1000}
Location: 1:48
Links: 1
Chunks: {1000, 1000} 8000000 bytes
Storage: 8000000 logical bytes, 283 allocated bytes, 2826855.12% utilization
Filter-0: ZSTD-32015 {5}
Type: native double
Address: 189
Flags Bytes Address Logical Offset
========== ======== ========== ==============================
0x00000000 283 189 [0, 0, 0] |
Thank you! I can reproduce locally. Thanks for your work :) |
@mkitti you want to include a compatibility constrain with the CodecZstd v0.8.3? Probably not hence you suggested the extension? Otherwise this seems to be ready to be merged? |
Can this even be done, given that CodecZstd isn't even a dependency or weak dep? |
riiiiiight, forgot that. Any reason then not to merge this? |
To be honest, I do not really understand how the compressors are being loaded here. |
I think that we should declare CodecZlib, CodecBzip2, and now CodecZstd as a weak dependencies with compat ranges. Eventually, it would be good to consider how to take advantage of package extensions. Over in JuliaIO/HDF5.jl#1160, I am working on converting all the HDF5 filter packages into package extensions. |
Compressor loading works the way it was done in FileIO for loading backends a few years ago. I'm not sure this has changed. The logic finds out which decompressor is needed for loading and checks if the library is loaded. If not, it checks if it is installed - in which case it then tries to load it. Pkg extensions would not help here, since there are zero lines of code specific to the individual backends. |
That's not very clear to me. Right now the loading mechanism ends up being very dynamic. I think this could be studied further. |
I think, this is a discussion for another day. |
fixes #357
@mkitti I'm not sure what the 4-element tuple in
ID_TO_COMPRESSOR
is supposed to be? Package name, compressor name, decompressor name, some short name in caps?