Support `hive_partitioning = TRUE` #222

krlmlr · 2024-08-12T23:20:44Z

for duckplyr_df_from_file() . Does duckdb needs to provide infrastructure for this?

The text was updated successfully, but these errors were encountered:

jeremy-allen · 2024-10-27T18:25:31Z

I just came here looking to see if I could read all the parquet files in subdirectories. I have a parent directory that I wrote with arrow. The dataframe I wrote there was grouped, so it was written in subdirectories with hive partitioning. Now, I am wanting to read those in with duckplyr_df_from_file().

krlmlr · 2024-10-27T19:57:07Z

Thanks, Jeremy. The help page at https://duckplyr.tidyverse.org/reference/df_from_file.html has some examples, with a lot of room for improvement. Also, the naming is subject to change: #210.

Did you try duckplyr_df_from_parquet() with a glob pattern? It looks like hive_partitioning = true is the default for the duckdb read_parquet() function, I think this particular issue can be closed.

https://duckdb.org/docs/data/parquet/overview#read_parquet-function

It could be that filters aren't pushed down to the Parquet reader, despite the Hive partitioning: #172.

jeremy-allen · 2024-10-27T23:14:24Z

Ah, found an example at https://duckdb.org/docs/data/partitioning/hive_partitioning. Then, in my case used * for the second-level directories and for filenames:
pq_path <- "data/seattle-library-checkouts/*/*.parquet"

Worked great!

krlmlr closed this as completed Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `hive_partitioning = TRUE` #222

Support `hive_partitioning = TRUE` #222

krlmlr commented Aug 12, 2024

jeremy-allen commented Oct 27, 2024

krlmlr commented Oct 27, 2024

jeremy-allen commented Oct 27, 2024

Support hive_partitioning = TRUE #222

Support hive_partitioning = TRUE #222

Comments

krlmlr commented Aug 12, 2024

jeremy-allen commented Oct 27, 2024

krlmlr commented Oct 27, 2024

jeremy-allen commented Oct 27, 2024

Support `hive_partitioning = TRUE` #222

Support `hive_partitioning = TRUE` #222