tensor.zeros to use async memset #1806

oleksandr-pavlyk · 2024-08-21T23:24:33Z

Addressed outstanding FIXME note to use async call to populate array with zeros.

Added optimization for _full_usm_ndarray to use handler::memset instead of handler::fill for 1-byte wide types and for other types when fill value is bitwise zero.

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you opening the PR as a draft?

This is akin to _full_usm_ndarray, but does not take fill_value, hence does not require castings. It dispatches straight to handler::memset.

github-actions · 2024-08-22T00:00:01Z

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

github-actions · 2024-08-22T00:05:37Z

Array API standard conformance tests for dpctl=0.18.0dev0=py310hdf72452_344 ran successfully.
Passed: 892
Failed: 3
Skipped: 119

coveralls · 2024-08-22T00:06:51Z

coverage: 87.903% (+0.002%) from 87.901%
when pulling 640e706 on fixme-async-memset
into 4297fef on master.

ndgrigorian · 2024-08-22T04:32:01Z

@oleksandr-pavlyk
Seems a few array API tests for full regressed with this change, we should see if it's a result of the changes or independent.

Bitwise zero values, and 1-byte wide types now use memset, instead of using fill. ``` In [1]: import dpctl.tensor as dpt, dpctl.tensor._tensor_impl as ti In [2]: res = dpt.empty(10**6, dtype="i8") In [3]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(0, dst=res, sycl_queue=res.sycl_queue)[0].wait() 243 µs ± 22.6 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [4]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(0, dst=res, sycl_queue=res.sycl_queue)[0].wait() 229 µs ± 14 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [5]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait() 227 µs ± 23 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [6]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait() 233 µs ± 25.9 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [7]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait() 301 µs ± 54.1 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [8]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait() 236 µs ± 17.2 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [9]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(0, dst=res, sycl_queue=res.sycl_queue)[0].wait() 240 µs ± 35.2 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [10]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(1, dst=res, sycl_queue=res.sycl_queue)[0].wait() 243 µs ± 17.6 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [11]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(1, dst=res, sycl_queue=res.sycl_queue)[0].wait() 263 µs ± 39.9 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [12]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(0, dst=res, sycl_queue=res.sycl_queue)[0].wait() 239 µs ± 26.4 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) In [13]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait() 224 µs ± 18.1 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each) ```

github-actions · 2024-08-22T12:34:58Z

Array API standard conformance tests for dpctl=0.18.0dev0=py310hdf72452_345 ran successfully.
Passed: 895
Failed: 0
Skipped: 119

dpctl/tensor/libtensor/source/full_ctor.cpp

dpctl/tensor/libtensor/source/zeros_ctor.cpp

Co-authored-by: ndgrigorian <46709016+ndgrigorian@users.noreply.github.com>

ndgrigorian

Approved, the change looks good to me

github-actions · 2024-08-22T18:44:16Z

Array API standard conformance tests for dpctl=0.18.0dev0=py310hdf72452_346 ran successfully.
Passed: 895
Failed: 0
Skipped: 119

oleksandr-pavlyk added 2 commits August 21, 2024 17:51

Introduce ti._zeros_usm_ndarray(dst, sycl_queue)

3f9a5bf

This is akin to _full_usm_ndarray, but does not take fill_value, hence does not require castings. It dispatches straight to handler::memset.

Use ti._zeros_usm_ndarray in dpctl.tensor.zeros

bec95f9

oleksandr-pavlyk requested a review from ndgrigorian as a code owner August 21, 2024 23:24

oleksandr-pavlyk added 2 commits August 22, 2024 06:16

Add test_full_cmplx128

f0d926a

oleksandr-pavlyk force-pushed the fixme-async-memset branch from da954f9 to f0d926a Compare August 22, 2024 11:52

ndgrigorian reviewed Aug 22, 2024

View reviewed changes

dpctl/tensor/libtensor/source/full_ctor.cpp Show resolved Hide resolved

ndgrigorian reviewed Aug 22, 2024

View reviewed changes

dpctl/tensor/libtensor/source/zeros_ctor.cpp Outdated Show resolved Hide resolved

Update dpctl/tensor/libtensor/source/zeros_ctor.cpp

640e706

Co-authored-by: ndgrigorian <46709016+ndgrigorian@users.noreply.github.com>

ndgrigorian approved these changes Aug 22, 2024

View reviewed changes

oleksandr-pavlyk merged commit cfba263 into master Aug 22, 2024
45 of 52 checks passed

oleksandr-pavlyk deleted the fixme-async-memset branch August 22, 2024 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor.zeros to use async memset #1806

tensor.zeros to use async memset #1806

oleksandr-pavlyk commented Aug 21, 2024

github-actions bot commented Aug 22, 2024 •

edited

Loading

github-actions bot commented Aug 22, 2024

coveralls commented Aug 22, 2024 •

edited

Loading

ndgrigorian commented Aug 22, 2024

github-actions bot commented Aug 22, 2024

ndgrigorian left a comment

github-actions bot commented Aug 22, 2024

tensor.zeros to use async memset #1806

tensor.zeros to use async memset #1806

Conversation

oleksandr-pavlyk commented Aug 21, 2024

github-actions bot commented Aug 22, 2024 • edited Loading

github-actions bot commented Aug 22, 2024

coveralls commented Aug 22, 2024 • edited Loading

ndgrigorian commented Aug 22, 2024

github-actions bot commented Aug 22, 2024

ndgrigorian left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 22, 2024

github-actions bot commented Aug 22, 2024 •

edited

Loading

coveralls commented Aug 22, 2024 •

edited

Loading