Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: remove protocol for form transformation #382

Merged
merged 1 commit into from
Oct 11, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Oct 8, 2023

This should be done in uproot, I think.

@codecov-commenter
Copy link

codecov-commenter commented Oct 8, 2023

Codecov Report

Merging #382 (05b3ce0) into main (16b5189) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #382      +/-   ##
==========================================
- Coverage   93.23%   93.21%   -0.02%     
==========================================
  Files          24       24              
  Lines        3043     3038       -5     
==========================================
- Hits         2837     2832       -5     
  Misses        206      206              
Files Coverage Δ
src/dask_awkward/__init__.py 100.00% <ø> (ø)
src/dask_awkward/lib/__init__.py 100.00% <ø> (ø)
src/dask_awkward/lib/io/io.py 96.00% <100.00%> (-0.08%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@douglasdavis
Copy link
Collaborator

douglasdavis commented Oct 9, 2023

Makes sense to me, since this was @lgray's I'll ask him to give it a green light as well

@lgray
Copy link
Collaborator

lgray commented Oct 9, 2023

I think it's OK - but we need to find a place for it to live in the long run that's not coffea since it's not generally coffea specific (HEP, however, appears to be the only place where we need it presently).

Having a factorization point between serialized data and user interface is pretty reasonable IMO, but you need to do some pretty advanced data transformations to require it.

@agoose77
Copy link
Collaborator Author

agoose77 commented Oct 9, 2023

@lgray my view is that right now only coffea uses this. I don't see it as being a dask-awkward feature unless we extend the various ingest interfaces to support it. So right now I think if coffea want to generalise to other packages, perhaps we make a new awkward-form-remapping package that provides these interfaces / tools?

@lgray
Copy link
Collaborator

lgray commented Oct 10, 2023

One of the first things I'm going to do one we get CoffeaTeam/coffea#900 is to turn the parquet nanoevents back on and start testing it routinely, which will require a patch to dask-awkward.

This also paves the way for future backends being easily ingested.

I'm certainly happy with another package if that seems like a prudent long term choice, but given the first thing I'm going to do is make it work for parquet, I'm pushing a bit to make sure we're not being overly strict about this.

@nikoladze
Copy link

Maybe the wrong place for this discussion, but i wonder when we create parquet files why we would even write them so "flat" like NanoAOD that we need to remap the form and not just write them in the structure we want it to have in the first place? Isn't it a nice feature of arrow/parquet that it seamlessly maps to awkward array?

@jpivarski
Copy link
Collaborator

After Arrow version 2 or so, it has become very good at writing NanoEvents-type structures (e.g. with lists of particle-records, rather than separate jagged arrays for each particle attribute). Due to the way that Parquet stores data, NanoEvents-form and NanoAOD-form would use the same disk space, but if it's easier to just write and read Parquet without changing the form, that would be a natural thing to do.

@agoose77
Copy link
Collaborator Author

@lgray given how easy it is to publish new versions and packages, I am feeling that we don't need to move this to dask-awkward at this time.

I wonder whether uproot.dask should in future not handle form remapping. It seems to me that coffea and/or some other HEP package should handle form remappng, and e.g. implement uproot.dask for this case.

As such, are you content if we merge this PR, leave things in uproot for now, and subsequently handle this on the downstream side?

@lgray
Copy link
Collaborator

lgray commented Oct 10, 2023

Yes - been happy with proceeding from the start - this particular branch of conversation is not a show stopper.

@agoose77 agoose77 force-pushed the agoose77/feat!-remove-form-transformation branch from a2762a1 to 395879b Compare October 10, 2023 20:40
@agoose77
Copy link
Collaborator Author

The test failures are due to Arrow being unavailable for Python 3.12.

As the other tests pass, and we can see that error, I'm going to merge this one and get it off the queue.

@agoose77 agoose77 merged commit 8c9ff67 into main Oct 11, 2023
38 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants