-
Hi, I'm looking for a vectorized operation to check for each element of an awkward array if it contained in a list of elements, e.g. >>> ak_isin(
ak.Array([['a', 'b', 'c'], ['b', 'c']]),
['b', 'x']
)
[[False, True, False],
[True, False]]
----------------------
type: 2 * var * bool I came up with two solutions (see below), but I was wondering if there's something builtin that I'm overlooking or any other more efficient/more elegant approach. Solution 1: equality + reduceLikely slow when from functools import reduce
import operator
def ak_isin_reduce(arr, haystack):
return reduce(operator.or_, (arr == el for el in haystack)) Solution 2: numbaimport numba as nb
import awkward as ak
@nb.njit()
def _ak_isin_numba_inner(arr, haystack, ab):
for row in arr:
ab.begin_list()
for v in row:
ab.append(v in haystack)
ab.end_list()
return ab
def ak_isin_numba(arr, haystack):
haystack = tuple(haystack)
return _ak_isin_numba_inner(arr, haystack, ak.ArrayBuilder()).snapshot() Mini-benchmark>>> import random
>>> def make_random_array(n, max_el=5):
ab = ak.ArrayBuilder()
for _ in range(n):
ab.begin_list()
for i in range(random.randint(0, max_el)):
ab.append(chr(i + 65) * 5)
ab.end_list()
return ab.snapshot()
>>> arr = make_random_array(3000)
[[],
['AAAAA'],
['AAAAA', 'BBBBB'],
[],
...
['AAAAA', 'BBBBB', 'CCCCC', 'DDDDD']]
-----------------------------------------------
type: 3000 * var * string
>>> %%timeit
>>> ak_isin_reduce(arr, ["AAAAA", "CCCCC", "XXXXX"])
5.01 ms ± 47.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %%timeit
>>> ak_isin_numba(arr, ["AAAAA", "CCCCC", "XXXXX"])
1.58 ms ± 8.97 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
This is now something that |
Beta Was this translation helpful? Give feedback.
-
Wait—although Also, let's not close Discussions. This is how they're different from Issues: Issues represent work to be done and we want to close them so that we can easily see what still needs to be done. Discussions remain useful to the community after the question is answered, since someone else might have that same question. If we view this as a feature request, then it should be converted into an Issue so that it can be closed when the feature is provided. But since @grst provided nicely worked-out examples of how to implement this without a built-in function, I think those examples would be useful to other people, even if what they need is not exactly |
Beta Was this translation helpful? Give feedback.
-
I was looking for |
Beta Was this translation helpful? Give feedback.
This is now something that
ak.str.is_in
will provide (soon to be released)!