feat: support returning empty arrays in `from_map` IO function calls based on user declared exceptions #400

douglasdavis · 2023-10-31T00:42:27Z

Discussed on Slack, this is a way to give from_map callers the ability to provide a list of "allowed" exceptions that should just return an empty array. It requires the io_func to have a form attribute such that a correctly formed empty array can be instantiated. cc @lgray

To use this new feature there are two requirements:

Write a mock_empty method (that takes a single argument that should be passed to ak.to_backend) in the class function passed to from_map. (The ColumnarProjectionMixin class in the lib.io.columnar module provides an example implementation).
pass values to the empty_on_raise and empty_backend arguments of from_map. tests/test_io.py has example from_map calls that includes values for these arguments

douglasdavis · 2023-10-31T00:44:57Z

This currently lacks the ability to keep track of what arguments fail.

agoose77 · 2023-10-31T01:33:47Z

src/dask_awkward/lib/io/io.py

+            result = fn(*args, **kwargs)
+            return result
+        except allowed_exceptions:
+            return ak.Array(fn.form.length_zero_array(highlevel=False))


We also have the existing .mock interface: https://github.com/douglasdavis/dask-awkward/blob/a368106dd57c4a355022db42ab9f97f853e2f566/src/dask_awkward/layers/layers.py#L54-L56

I wonder if either we should extend this with another method mock_empty(), or whether we should require only mock and take the form from the mocked array: fn.mock().layout.form.length_zero_array()

One thing to be mindful of is that we should allow the implementer to choose the backend. Perhaps mock should take a backend argument instead...

Ah yes I like the idea of reusing mock for this task, thanks for peeking at this

Since mock has been around for a while and has been associated with typetracer-only parts of the code, perhaps it is best to go with a new mock_empty(backend: str) method

Concerning the backend configurability: we should probably forbid the implementer to inject the typetracer backend as for this purpose it just doesn't make any sense?

codecov-commenter · 2023-10-31T16:16:00Z

Codecov Report

Merging #400 (6e46fdf) into main (bde300c) will decrease coverage by 0.09%.
Report is 1 commits behind head on main.
The diff coverage is 91.07%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #400      +/-   ##
==========================================
- Coverage   94.03%   93.94%   -0.09%     
==========================================
  Files          23       23              
  Lines        3066     3107      +41     
==========================================
+ Hits         2883     2919      +36     
- Misses        183      188       +5

Files	Coverage Δ
src/dask_awkward/lib/io/io.py	`96.64% <100.00%> (+0.64%)`	⬆️
src/dask_awkward/lib/testutils.py	`100.00% <100.00%> (ø)`
src/dask_awkward/lib/io/columnar.py	`98.38% <66.66%> (-1.62%)`	⬇️
src/dask_awkward/layers/layers.py	`93.07% <60.00%> (-2.83%)`	⬇️

... and 1 file with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

douglasdavis · 2023-10-31T16:23:51Z

Alright I've reworked the PR to use a new mock_empty method which is designed to be restricted to one of ("cpu", "jax", "cuda")

douglasdavis · 2023-10-31T16:25:55Z

We also need a way to communicate the backend to return_empty_on_raise

lgray · 2023-10-31T16:37:42Z

src/dask_awkward/layers/layers.py

+    def mock_empty(self, backend: BackendT = "cpu") -> AwkwardArray:
+        import awkward as ak
+
+        if backend not in ("cpu", "jax", "cuda"):


FWIW, despite this being a bit of an abuse of a private interface you could make a set out of the keys of https://github.com/scikit-hep/awkward/blob/main/src/awkward/_backends/dispatch.py#L24
and subtract "typetracer" from the set and use that here. Should remove and problems coming from more backends over time.

@agoose77 may have issue with this though given it being hacky.

src/dask_awkward/lib/io/io.py

agoose77 · 2023-11-03T14:56:54Z

src/dask_awkward/lib/io/io.py

+    @functools.wraps(fn)
+    def wrapped(*args, **kwargs):
+        try:
+            return fn(*args, **kwargs)
+        except allowed_exceptions:
+            return fn.mock_empty(backend)
+
+    return wrapped


We discussed adding a report mechanism here, so that the partitions can return a Report object (that perhaps the IO func can populate).

This would mean at a high level that we don't build an dak.Array from the io_func applications; rather, we need an unpacking step so that we build two subgraphs that depend upon the io_func result.

for more information, see https://pre-commit.ci

douglasdavis · 2023-11-10T16:59:22Z

@lgray with this PR merged we get graceful skips without well defined reporting. Two steps for enabling the feature are written out in the PR description.

I did add a log message such that

import logging
logging.basicConfig()
logging.getLogger("dask_awkward.lib.io.io").setLevel(logging.INFO)

will give a log message with some details about the calls that return an empty array. I consider this temporary and we can followup this PR with a better developed reporting mechanism

lgray · 2023-11-10T17:10:28Z

Great! This gives enough to prototype something that's a good analogue of what we have presently in coffea.

agoose77 reviewed Oct 31, 2023

View reviewed changes

lgray reviewed Oct 31, 2023

View reviewed changes

douglasdavis commented Nov 1, 2023

View reviewed changes

src/dask_awkward/lib/io/io.py Outdated Show resolved Hide resolved

douglasdavis force-pushed the allow-io-functions-to-raise branch 2 times, most recently from 64ad900 to 583fc94 Compare November 2, 2023 20:41

douglasdavis marked this pull request as ready for review November 2, 2023 21:57

agoose77 reviewed Nov 3, 2023

View reviewed changes

douglasdavis and others added 19 commits November 7, 2023 10:44

allow fn passed to from_map raise exceptions provided by caller

e1536c5

[pre-commit.ci] auto fixes from pre-commit.com hooks

1034e9f

for more information, see https://pre-commit.ci

unused import

5696544

add test

6249542

mypy

ddde384

make mock_empty method for IO functions

8379af7

typetracer from form does return highlevel obj

85ee182

whoops

5887ee8

typing

de89b5b

unnecessary typing in test

33f9940

typing

1949861

default arg

799035b

rm unused

344c6eb

remove behavior= support in from_lists

35707c2

keep elif meta is not None

d25c96b

bring back

2f27bf4

pre-commit

5fd069c

add support for backend definition

f317280

unused

d4ddfd7

reuse other class method

a1b8259

douglasdavis force-pushed the allow-io-functions-to-raise branch from d730569 to a1b8259 Compare November 7, 2023 16:44

douglasdavis added 5 commits November 9, 2023 12:50

use a Queue for failed read 'report'

1e23c84

keep trying

b54bbe8

fix

2e4b680

pre-commit is weird

67655fe

just pring a log msg for now

6e46fdf

douglasdavis changed the title ~~feat: return empty array if from_map IO function raises certain exceptions~~ feat: support returning empty arrays in from_map IO function calls based on user declared exceptions Nov 10, 2023

douglasdavis merged commit f447df4 into dask-contrib:main Nov 10, 2023
22 of 23 checks passed

douglasdavis deleted the allow-io-functions-to-raise branch November 10, 2023 17:01

douglasdavis mentioned this pull request Nov 10, 2023

fix: don't require mock and mock_empty together; use behaviors in empty array creation #408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support returning empty arrays in `from_map` IO function calls based on user declared exceptions #400

feat: support returning empty arrays in `from_map` IO function calls based on user declared exceptions #400

douglasdavis commented Oct 31, 2023 •

edited

Loading

douglasdavis commented Oct 31, 2023

agoose77 Oct 31, 2023 •

edited

Loading

douglasdavis Oct 31, 2023 •

edited

Loading

douglasdavis Oct 31, 2023 •

edited

Loading

lgray Oct 31, 2023

codecov-commenter commented Oct 31, 2023 •

edited

Loading

douglasdavis commented Oct 31, 2023

douglasdavis commented Oct 31, 2023

lgray Oct 31, 2023 •

edited

Loading

agoose77 Nov 3, 2023

douglasdavis commented Nov 10, 2023 •

edited

Loading

lgray commented Nov 10, 2023

feat: support returning empty arrays in from_map IO function calls based on user declared exceptions #400

feat: support returning empty arrays in from_map IO function calls based on user declared exceptions #400

Conversation

douglasdavis commented Oct 31, 2023 • edited Loading

douglasdavis commented Oct 31, 2023

agoose77 Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

douglasdavis Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

douglasdavis Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

lgray Oct 31, 2023

Choose a reason for hiding this comment

codecov-commenter commented Oct 31, 2023 • edited Loading

Codecov Report

douglasdavis commented Oct 31, 2023

douglasdavis commented Oct 31, 2023

lgray Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

agoose77 Nov 3, 2023

Choose a reason for hiding this comment

douglasdavis commented Nov 10, 2023 • edited Loading

lgray commented Nov 10, 2023

feat: support returning empty arrays in `from_map` IO function calls based on user declared exceptions #400

feat: support returning empty arrays in `from_map` IO function calls based on user declared exceptions #400

douglasdavis commented Oct 31, 2023 •

edited

Loading

agoose77 Oct 31, 2023 •

edited

Loading

douglasdavis Oct 31, 2023 •

edited

Loading

douglasdavis Oct 31, 2023 •

edited

Loading

codecov-commenter commented Oct 31, 2023 •

edited

Loading

lgray Oct 31, 2023 •

edited

Loading

douglasdavis commented Nov 10, 2023 •

edited

Loading