-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support returning empty arrays in from_map
IO function calls based on user declared exceptions
#400
feat: support returning empty arrays in from_map
IO function calls based on user declared exceptions
#400
Conversation
This currently lacks the ability to keep track of what arguments fail. |
src/dask_awkward/lib/io/io.py
Outdated
result = fn(*args, **kwargs) | ||
return result | ||
except allowed_exceptions: | ||
return ak.Array(fn.form.length_zero_array(highlevel=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have the existing .mock
interface: https://github.com/douglasdavis/dask-awkward/blob/a368106dd57c4a355022db42ab9f97f853e2f566/src/dask_awkward/layers/layers.py#L54-L56
I wonder if either we should extend this with another method mock_empty()
, or whether we should require only mock
and take the form from the mocked array: fn.mock().layout.form.length_zero_array()
One thing to be mindful of is that we should allow the implementer to choose the backend. Perhaps mock
should take a backend
argument instead...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I like the idea of reusing mock for this task, thanks for peeking at this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since mock
has been around for a while and has been associated with typetracer-only parts of the code, perhaps it is best to go with a new mock_empty(backend: str)
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning the backend configurability: we should probably forbid the implementer to inject the typetracer backend as for this purpose it just doesn't make any sense?
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## main #400 +/- ##
==========================================
- Coverage 94.03% 93.94% -0.09%
==========================================
Files 23 23
Lines 3066 3107 +41
==========================================
+ Hits 2883 2919 +36
- Misses 183 188 +5
... and 1 file with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
Alright I've reworked the PR to use a new |
We also need a way to communicate the backend to |
def mock_empty(self, backend: BackendT = "cpu") -> AwkwardArray: | ||
import awkward as ak | ||
|
||
if backend not in ("cpu", "jax", "cuda"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, despite this being a bit of an abuse of a private interface you could make a set out of the keys of https://github.com/scikit-hep/awkward/blob/main/src/awkward/_backends/dispatch.py#L24
and subtract "typetracer" from the set and use that here. Should remove and problems coming from more backends over time.
@agoose77 may have issue with this though given it being hacky.
64ad900
to
583fc94
Compare
src/dask_awkward/lib/io/io.py
Outdated
@functools.wraps(fn) | ||
def wrapped(*args, **kwargs): | ||
try: | ||
return fn(*args, **kwargs) | ||
except allowed_exceptions: | ||
return fn.mock_empty(backend) | ||
|
||
return wrapped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed adding a report mechanism here, so that the partitions can return a Report
object (that perhaps the IO func can populate).
This would mean at a high level that we don't build an dak.Array
from the io_func
applications; rather, we need an unpacking step so that we build two subgraphs that depend upon the io_func
result.
for more information, see https://pre-commit.ci
d730569
to
a1b8259
Compare
from_map
IO function raises certain exceptionsfrom_map
IO function calls based on user declared exceptions
@lgray with this PR merged we get graceful skips without well defined reporting. Two steps for enabling the feature are written out in the PR description. I did add a log message such that import logging
logging.basicConfig()
logging.getLogger("dask_awkward.lib.io.io").setLevel(logging.INFO) will give a log message with some details about the calls that return an empty array. I consider this temporary and we can followup this PR with a better developed reporting mechanism |
Great! This gives enough to prototype something that's a good analogue of what we have presently in coffea. |
Discussed on Slack, this is a way to give
from_map
callers the ability to provide a list of "allowed" exceptions that should just return an empty array. It requires theio_func
to have aform
attribute such that a correctly formed empty array can be instantiated. cc @lgrayTo use this new feature there are two requirements:
mock_empty
method (that takes a single argument that should be passed toak.to_backend
) in the class function passed tofrom_map
. (TheColumnarProjectionMixin
class in thelib.io.columnar
module provides an example implementation).empty_on_raise
andempty_backend
arguments offrom_map
.tests/test_io.py
has examplefrom_map
calls that includes values for these arguments