Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate thrift definitions using macro from compact-thrift-rs #5909

Closed
wants to merge 3 commits into from

Conversation

jhorstmann
Copy link
Contributor

Which issue does this PR close?

An alternative implementation of a custom thrift decoder, this time based on rust declarative macros instead of a separate code generator.

Fixes #5854, Alternative to #5869

Rationale for this change

The current thrift definitions are not easy to customize and also leave some performance on the table. The macro-based approach here allows renaming fields and replacing types with custom implementations, while still being relatively easy to update.

The downsides of the macro is that error messages in case of typos are not that easy to deciper. Once there are more customizations, updates would need to be done via diff with the official definitions.

Even without customizations like borrowing data, the performance improvement is significant. On my machine, with native target and still the default allocator:

open(default)           time:   [9.9190 µs 9.9540 µs 9.9881 µs]
                        change: [-33.053% -32.672% -32.380%] (p = 0.00 < 0.05)
                        Performance has improved.

open(page index)        time:   [437.81 µs 439.01 µs 440.28 µs]
                        change: [-26.632% -26.481% -26.313%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Replaces the thrift structures and custom protocol with a macro that wraps the nearly unmodified parquet.thrift definitions.

Are there any user-facing changes?

This should mostly be a drop-in replacement, customizations and optimizations could be done in future PRs, for example borrowing data or replacing a Vec<Encoding> with a small bitset.

One api change is that structs and enum no longer derive the Eq trait, only PartialEq.

@tustvold tustvold closed this Oct 8, 2024
@tustvold
Copy link
Contributor

tustvold commented Oct 8, 2024

Just clearing up some old PRs, feel free to reopen if you plan on continuing with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use custom thrift decoder to improve speed of parsing parquet metadata
2 participants