Skip to content

Load specific columns from parquet files #1522

Answered by jpivarski
pkausw asked this question in Q&A
Discussion options

You must be logged in to vote

You're right, @agoose77, that the version 2 implementation has a lot more options for limiting the read—motivated by the fact that v1 would often be used with lazy arrays, but now we're separating the laziness from Awkward into dask-awkward and the non-Dask Awkward function will need to be more tuneable.

Here's how to do a projected read. I'm importing Awkward as

>>> import awkward as ak

so that all of my "._v2"s are explicit, but you could import awkward._v2 as ak.

In order to know what columns there are, to know what to ask for, we can get a metadata object:

>>> filename = "https://pivarski-princeton.s3.amazonaws.com/chicago-taxi.parquet"
>>> metadata = ak._v2.metadata_from_parquet(file…

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by pkausw
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants