-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: support placeholder arrays in pickling #366
Conversation
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## main #366 +/- ##
==========================================
- Coverage 93.87% 93.82% -0.05%
==========================================
Files 23 23
Lines 2907 2917 +10
==========================================
+ Hits 2729 2737 +8
- Misses 178 180 +2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
if column.endswith(LIST_KEY): | ||
list_parent_path = column[: -(len(LIST_KEY) + 1)].rstrip(".") | ||
if list_parent_path not in necessary_columns: | ||
necessary_columns.append(f"{list_parent_path}.*") | ||
else: | ||
necessary_columns.append(column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@douglasdavis this is where I was initially thinking that we need to do something for shape_touched
. However, it might prove fiddly.
Naively, I thought about A.*
if we want the shape of A.B
. However, this wouldn't work for the following type of A
1 * {
x: var * int64,
y: var * int64
}
Computing A.y
wouldn't give any information about the length of the NumpyArray
in A.x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to follow up (I don't expect you to have seen this over the weekend) — this is the blocker for @lgray's demo this week AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I'm understanding: are you thinking that there's still a another case we need to add to this part of the code? As things stand right now I'm able to follow the changes and they looks good to me (but I don't know with certainty that all of our cases are covered)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR fixes the pickling problem, but there's still the task of loading lengths for buffers with shape_touched
. I'll move this to Slack :)
The only functional changes should be in https://github.com/dask-contrib/dask-awkward/pull/366/files#diff-29b3283fdeb3b2f56a78b5edbe6265f5836136112dfecb5114ab68fb067a2008 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things look good to me and tests are passing, just had that one question on the inline comment
It doesn't do it for me. The reproducer in #363: import dask
import uproot
import dask_awkward as dak
events = uproot.dask({"/path/to/coffea/tests/samples/nano_dy.root": "Events"})
p1 = events.Electron_pt[:20]
p2 = events.Electron_pt[20:]
dask.compute(p1, p2, scheduler="processes") fails with FAILED tests/test_tnp.py::test_local_compute[False-processes] - TypeError: internal_error: the _UnknownLength class should never be directly instantiated
FAILED tests/test_tnp.py::test_local_compute[True-processes] - TypeError: internal_error: the _UnknownLength class should never be directly instantiated
FAILED tests/test_tnp.py::test_distributed_compute[False] - TypeError: cannot order unknown lengths
FAILED tests/test_tnp.py::test_distributed_compute[True] - TypeError: cannot order unknown lengths import dask
import uproot
import dask_awkward as dak
from distributed import Client
events = uproot.dask({"/path/to/coffea/tests/samples/nano_dy.root": "Events"})
p1 = events.Electron_pt[:20]
p2 = events.Electron_pt[20:]
with Client():
dask.compute(p1, p2) This snippet wil give you this: |
@iasonkrom you need the latest Awkward for this (HEAD) |
Yeah my bad, however, first snippet now gives: TypeError: PlaceholderArray supports only trivial slices, not int
This error occurred while calling
ak.from_buffers(
{'class': 'RecordArray', 'fields': ['run', 'luminosityBlock', 'event'...
40
{'node1-data': <awkward._nplikes.placeholder.PlaceholderArray object ...
highlevel = False
buffer_key = '{form_key}-{attribute}'
byteorder = '<'
) and the second snippet still the same error: Traceback (most recent call last):
File "/Users/iason/fun/egamma_dev/egamma-tnp/test2.py", line 10, in <module>
dask.compute(p1, p2)
File "/Users/iason/fun/egamma_dev/venv/lib/python3.10/site-packages/dask/base.py", line 628, in compute
results = schedule(dsk, keys, **kwargs)
File "/Users/iason/fun/egamma_dev/venv/lib/python3.10/site-packages/distributed/spill.py", line 181, in __setitem__
super().__setitem__(key, value)
File "/Users/iason/fun/egamma_dev/venv/lib/python3.10/site-packages/zict/buffer.py", line 193, in __setitem__
self.fast[key] = value
File "/Users/iason/fun/egamma_dev/venv/lib/python3.10/site-packages/zict/lru.py", line 136, in __setitem__
self.set_noevict(key, value)
File "/Users/iason/fun/egamma_dev/venv/lib/python3.10/site-packages/zict/common.py", line 127, in wrapper
return func(*args, **kwargs)
File "/Users/iason/fun/egamma_dev/venv/lib/python3.10/site-packages/zict/lru.py", line 163, in set_noevict
if weight > self.n:
File "/Users/iason/fun/egamma_dev/venv/lib/python3.10/site-packages/awkward/_nplikes/shape.py", line 67, in __gt__
raise TypeError("cannot order unknown lengths")
TypeError: cannot order unknown lengths |
@iasonkrom could you use scikit-hep/awkward#2714 instead of HEAD for awkward? I might have misremembered with PR you should use :) |
Yup, the branch works fine. Both snippets do not raise an error and pytests pass |
I'll put out a release with this fix in 👍 |
@douglasdavis this PR refactors some code to aid understanding of the logic, and ensures that we don't try to pickle non-buffers using pickle protocol 5.