Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor read #79

Merged
merged 76 commits into from
Aug 16, 2024
Merged

Refactor read #79

merged 76 commits into from
Aug 16, 2024

Conversation

gaborcsardi
Copy link
Member

Complete rewrite of Parquet reader.

Can read PLAIN, uncompressed INT32 columns with not too many
pages.
There is still a lot of repetition, but maybe it can't be avoided.
In general the current structure looks workable.
Cannot test this currently.
Cannot test this currently.
Almost ready for optional columns.
Looks pretty good now
This is a bit too slow and we'll want to do the reads in place instead...
No need to create a SEXP for the metadata.
(Incomplete.)
Need to try()-catch() wrap C++ code within
R_UnwindPortect(), otherwise it crashes.
So check is clean.
This will be configurable later.
This is a workaround until we have a more principled
way of specifying the R schema in `read_parquet()`.
Do not remove data, skip skip it.
Switching over to new reader in all tests, gradually.
@gaborcsardi gaborcsardi merged commit f0c0e2e into main Aug 16, 2024
13 checks passed
@gaborcsardi gaborcsardi deleted the refactor-read branch August 16, 2024 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant