Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hive_partitioning = TRUE #222

Closed
krlmlr opened this issue Aug 12, 2024 · 3 comments
Closed

Support hive_partitioning = TRUE #222

krlmlr opened this issue Aug 12, 2024 · 3 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Aug 12, 2024

for duckplyr_df_from_file() . Does duckdb needs to provide infrastructure for this?

@jeremy-allen
Copy link

I just came here looking to see if I could read all the parquet files in subdirectories. I have a parent directory that I wrote with arrow. The dataframe I wrote there was grouped, so it was written in subdirectories with hive partitioning. Now, I am wanting to read those in with duckplyr_df_from_file().

@krlmlr
Copy link
Member Author

krlmlr commented Oct 27, 2024

Thanks, Jeremy. The help page at https://duckplyr.tidyverse.org/reference/df_from_file.html has some examples, with a lot of room for improvement. Also, the naming is subject to change: #210.

Did you try duckplyr_df_from_parquet() with a glob pattern? It looks like hive_partitioning = true is the default for the duckdb read_parquet() function, I think this particular issue can be closed.

https://duckdb.org/docs/data/parquet/overview#read_parquet-function

It could be that filters aren't pushed down to the Parquet reader, despite the Hive partitioning: #172.

@krlmlr krlmlr closed this as completed Oct 27, 2024
@jeremy-allen
Copy link

Ah, found an example at https://duckdb.org/docs/data/partitioning/hive_partitioning. Then, in my case used * for the second-level directories and for filenames:
pq_path <- "data/seattle-library-checkouts/*/*.parquet"

Worked great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants