-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hive_partitioning = TRUE
#222
Comments
I just came here looking to see if I could read all the parquet files in subdirectories. I have a parent directory that I wrote with arrow. The dataframe I wrote there was grouped, so it was written in subdirectories with hive partitioning. Now, I am wanting to read those in with duckplyr_df_from_file(). |
Thanks, Jeremy. The help page at https://duckplyr.tidyverse.org/reference/df_from_file.html has some examples, with a lot of room for improvement. Also, the naming is subject to change: #210. Did you try https://duckdb.org/docs/data/parquet/overview#read_parquet-function It could be that filters aren't pushed down to the Parquet reader, despite the Hive partitioning: #172. |
Ah, found an example at https://duckdb.org/docs/data/partitioning/hive_partitioning. Then, in my case used * for the second-level directories and for filenames: Worked great! |
for
duckplyr_df_from_file()
. Does duckdb needs to provide infrastructure for this?The text was updated successfully, but these errors were encountered: