Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ZstdFrameCompressor via endOp #52

Merged
merged 7 commits into from
May 29, 2024

Conversation

mkitti
Copy link
Member

@mkitti mkitti commented May 24, 2024

Alternate approach to #46. ZstdFrameCompressor is just a ZstdCompressor that has endOp = :end.

With all the available data, the compressor will create and close a frame.

Note there is no ZstdFrameDecompressor here.

@mkitti mkitti requested a review from nhz2 May 24, 2024 05:40
Copy link

codecov bot commented May 24, 2024

Codecov Report

Attention: Patch coverage is 93.02326% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 36.96%. Comparing base (ad7288a) to head (9f425dd).

Files Patch % Lines
src/compression.jl 89.65% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #52      +/-   ##
==========================================
+ Coverage   33.01%   36.96%   +3.94%     
==========================================
  Files           5        5              
  Lines         524      560      +36     
==========================================
+ Hits          173      207      +34     
- Misses        351      353       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nhz2
Copy link
Member

nhz2 commented May 26, 2024

I'm getting an zstd error running the following test that the codec supports streaming.

using TranscodingStreams
using Test
using CodecZstd
    @testset "ZstdFrameCompressor streaming" begin
        codec = ZstdFrameCompressor()
        TranscodingStreams.initialize(codec)
        e = TranscodingStreams.Error()
        r = TranscodingStreams.startproc(codec, :write, e)
        @test r == :ok
        # data buffers
        data = rand(UInt8, 32*1024*1024)
        buffer1 = copy(data)
        buffer2 = zeros(UInt8, length(data)*2)
        GC.@preserve buffer1 buffer2 begin
            total_out = 0
            total_in = 0
            while total_in < length(data) || r != :end
                in_size = min(length(buffer1) - total_in, 1024*1024)
                out_size = min(length(buffer2) - total_out, 1024)
                input = TranscodingStreams.Memory(pointer(buffer1, total_in + 1), UInt(in_size))
                output = TranscodingStreams.Memory(pointer(buffer2, total_out + 1), UInt(out_size))
                Δin, Δout, r = TranscodingStreams.process(codec, input, output, e)
                if r == :error
                    throw(e[])
                end
                total_out += Δout
                total_in += Δin
            end
            @test r == :end
        end
        resize!(buffer2, total_out)
        @test transcode(ZstdDecompressor, buffer2) == data
    end

@mkitti
Copy link
Member Author

mkitti commented May 28, 2024

The Zstandard error was "Src size is incorrect". I believe this is because we did not the following the protocol for ZSTD_e_end.

Apparently that protocol means that the input buffer should not change between successive calls to ZSTD_compressStream2

Calling ZSTD_compressStream2() with ZSTD_e_end instructs to finish a frame.
It will perform a flush and write frame epilogue.
The epilogue is required for decoders to consider a frame completed.
flush operation is the same, and follows same rules as calling ZSTD_compressStream2() with ZSTD_e_flush.
You must continue calling ZSTD_compressStream2() with ZSTD_e_end until it returns 0, at which point you are free to
start a new frame.
note: ZSTD_e_end will flush as much output as possible, meaning when compressing with multiple threads, it will
block until the flush is complete or the output buffer is full.
@return : 0 if frame fully completed and fully flushed,
>0 if some data still present within internal buffer (the value is minimal estimation of remaining size),
or an error code, which can be tested using ZSTD_isError().

@mkitti
Copy link
Member Author

mkitti commented May 28, 2024

The Julia 1.6 tests on Ubuntu seem a bit flaky to me. I'm not sure what that is about.

@mkitti
Copy link
Member Author

mkitti commented May 28, 2024

I see the flaw here now. I cannot guaratee the that the input memory before will not be garbage collected.

@mkitti
Copy link
Member Author

mkitti commented May 28, 2024

In ed3bf8c I started to implement an input buffer.

In afabe28, I realized that I do need the input buffer. We just need to keep the input size and position constant if we were not able to output the entire frame at once. To accomplish this without buffering, I just set offset the source pointer back by the last position. Thus no additional copying is required.

src/compression.jl Outdated Show resolved Hide resolved
@mkitti
Copy link
Member Author

mkitti commented May 29, 2024

One last question for this PR. Should ZstdFrameCompressor be a separate type or does it make sense that it is a parameter variant of ZstdCompressor?

@mkitti
Copy link
Member Author

mkitti commented May 29, 2024

Merging this so we can push forward. I might revisit the type question before release.

@mkitti
Copy link
Member Author

mkitti commented May 29, 2024

Thank you for the review @nhz2 , I think this is in a substantially better place than where we started.

@mkitti mkitti merged commit 89cd7ed into JuliaIO:master May 29, 2024
8 checks passed
@milankl
Copy link

milankl commented May 29, 2024

Awesome thanks so much!! That's all that was missing to get Zstd into JLD2 right? Then I'll update the pull request there.

@nhz2
Copy link
Member

nhz2 commented May 29, 2024

I think at some point we can have const ZstdFrameCompressor = ZstdCompressor and ZstdCompressor would always store the size in the header when used with transcode. For now, I don't think it needs to be a separate type, but it might be cleaner that way, not sure.

@milankl
Copy link

milankl commented May 30, 2024

For JuliaIO/JLD2.jl#560 we'd need a release, would that be possible?

@mkitti
Copy link
Member Author

mkitti commented Jun 2, 2024

This was released in 0.8.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants