Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filling histograms twice fails in 2023.9.1 only #363

Closed
ikrommyd opened this issue Sep 13, 2023 · 4 comments
Closed

Filling histograms twice fails in 2023.9.1 only #363

ikrommyd opened this issue Sep 13, 2023 · 4 comments

Comments

@ikrommyd
Copy link
Contributor

import uproot
import hist.dask as hda
import dask_awkward as dak
from distributed import Client

events = uproot.dask({"../../coffea_dev/coffea/tests/samples/nano_dy.root": "Events"})

h = hda.Hist.new.Variable([0, 10, 20, 30, 40, 80, 120, 200], name="x").Double()

p1 = events.Electron_pt[:20]
p2 = events.Electron_pt[20:]

h.fill(dak.flatten(p1))
h.fill(dak.flatten(p2))

h.compute(scheduler="processes")

This fails with Client() or "processes" scheduler .
With "processes" it gives:

File ~/fun/egamma_dev/venv/lib/python3.10/site-packages/dask_awkward/pickle.py:46, in <dictcomp>()
     44 # For pickle >= 5, we can avoid copying the buffers
     45 if protocol >= 5:
---> 46     container = {k: pickle.PickleBuffer(v) for k, v in container.items()}
     48 if array.behavior is ak.behavior:
     49     behavior = None

TypeError: a bytes-like object is required, not 'PlaceholderArray'

while with Client() it gives:

File ~/fun/egamma_dev/venv/lib/python3.10/site-packages/awkward/_nplikes/shape.py:77, in __gt__()
     76 def __gt__(self, other):
---> 77     raise TypeError("cannot order unknown lengths")

TypeError: cannot order unknown lengths
@douglasdavis
Copy link
Collaborator

@agoose77 since you worked on the pickle stuff I'm going ahead and pinging you here :) I'll still take a look at investigating at some point today

@ikrommyd
Copy link
Contributor Author

@douglasdavis @agoose77 Also I would like to note that I've been encountering some strange behavior in 2023.9.1 in general.
For instance this:

events = NanoEventsFactory.from_root(
    {"root_files/Egamma1.root": "Events"},
    permit_dask=True,
    chunks_per_file=10,
).events()

def filter_events(events, pt):
    good_events = events[dak.num(events.Electron) >= 2]
    abs_eta = abs(events.Electron.eta)
    pass_eta_ebeegap = (abs_eta < 1.4442) | (abs_eta > 1.566)
    pass_tight_id = events.Electron.cutBased == 4
    pass_pt = events.Electron.pt > pt
    pass_eta = abs_eta <= 2.5
    pass_selection = pass_pt & pass_eta & pass_eta_ebeegap & pass_tight_id
    n_of_tags = dak.sum(pass_selection, axis=1)
    good_events = events[n_of_tags >= 2]
    good_locations = pass_selection[n_of_tags >= 2]

    return good_events

client = Client()
x = filter_events(events, 31)
dask.compute(x)

runs into some weird

File ~/fun/egamma_dev/venv/lib/python3.10/site-packages/tornado/iostream.py:1116, in IOStream.read_from_fd(***failed resolving arguments***)
   1114 def read_from_fd(self, buf: Union[bytearray, memoryview]) -> Optional[int]:
   1115     try:
-> 1116         return self.socket.recv_into(buf, len(buf))
   1117     except BlockingIOError:
   1118         return None

OSError: [Errno 22] Invalid argument

only on some root files, while on others it does not. All this disappears in 2023.9.0

@douglasdavis
Copy link
Collaborator

douglasdavis commented Sep 13, 2023

At the highest level it looks like the pickle stuff (made it into 2023.9.0) combined with the turning back on of form rehydration (made it in to 2023.9.1) is leading to an error associated with attempting to serialize a PlaceholderArray. I was able to see this without histogramming:

import uproot
import dask_awkward as dak
events = uproot.dask({"/path/to/coffea/tests/samples/nano_dy.root": "Events"})
p1 = events.Electron_pt[:20]
p2 = events.Electron_pt[20:]
h.compute(p1, p2, scheduler="processes")

@ikrommyd
Copy link
Contributor Author

Closing this one as fixed by #366 and scikit-hep/awkward#2714

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants