Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getindex overhead #166

Open
johnnychen94 opened this issue Oct 17, 2020 · 3 comments
Open

getindex overhead #166

johnnychen94 opened this issue Oct 17, 2020 · 3 comments

Comments

@johnnychen94
Copy link
Member

johnnychen94 commented Oct 17, 2020

I'm recently building up some cache array with OffsetArrays and realized the performance bottleneck becomes getindex(::OffsetArray, I).

The benchmark result looks interesting; unsure why arr_sum runs faster on OffsetArray 🤔 Any ideas?

using OffsetArrays

X = rand(4, 4, 4, 4, 4, 4);
XO = OffsetArray(X, -1, -2, -3, 1, 2, 3);

function arr_sum(X)
    val = zero(eltype(X))
    R = CartesianIndices(X)
    for i in R
        @inbounds val += X[i]
    end
    val
end

@btime arr_sum($X) # 5.215 μs (0 allocations: 0 bytes)
@btime arr_sum($XO) # 3.730 μs (0 allocations: 0 bytes)
@btime getindex($X, 1, 1, 1, 1, 1, 1) # 1.983 ns (0 allocations: 0 bytes)
@btime getindex($XO, 3, 2, 1, 2, 3, 4) # 5.855 ns (0 allocations: 0 bytes)

getindex_inbounds(X, inds...) = @inbounds X[inds...]
@btime getindex_inbounds($X, 1, 1, 1, 1, 1, 1) # 1.430 ns (0 allocations: 0 bytes)
@btime getindex_inbounds($XO, 3, 2, 1, 2, 3, 4) # 2.323 ns (0 allocations: 0 bytes)

The default checkbounds implementation definitely takes too long here. I believe the additional time is spent on the construction of IdOffsetRange and its generic and thus slower getindex.

julia> @btime axes($X);
  1.431 ns (0 allocations: 0 bytes)

julia> @btime axes($XO);
  4.763 ns (0 allocations: 0 bytes)

These might be benchmark artifacts, though.

@jishnub
Copy link
Member

jishnub commented Oct 17, 2020

For the record I obtain the same run-times

julia> @btime arr_sum($X);
  6.528 μs (0 allocations: 0 bytes)

julia> @btime arr_sum($XO);
  6.535 μs (0 allocations: 0 bytes)

This is odd, as indexing is definitely faster with Arrays

julia> @btime getindex($X, 1, 1, 1, 1, 1, 1);
  3.768 ns (0 allocations: 0 bytes)

julia> @btime getindex($XO, 3, 2, 1, 2, 3, 4);
  14.857 ns (0 allocations: 0 bytes)

julia> getindex_inbounds(X, inds...) = @inbounds X[inds...];

julia> @btime getindex_inbounds($X, 1, 1, 1, 1, 1, 1);
  2.694 ns (0 allocations: 0 bytes)

julia> @btime getindex_inbounds($XO, 3, 2, 1, 2, 3, 4);
  4.960 ns (0 allocations: 0 bytes)

@johnnychen94
Copy link
Member Author

For the record I obtain the same run-times

Sorry I wasn't aware there's a performance regression in julia 1.6-dev, see also JuliaLang/julia#38073

@timholy
Copy link
Member

timholy commented Oct 18, 2020

Is the "raw" indexing time relevant? Any performance-sensitive operation is presumably in a loop, and in a loop won't the compiler hoist all the slow parts out of the loop?

As evidence, if I define arr_sum_bc identically to arr_sum but without the @inbounds, here's what I get:

julia> @btime arr_sum($X)
  5.992 μs (0 allocations: 0 bytes)
2069.986834289989

julia> @btime arr_sum($XO)
  4.093 μs (0 allocations: 0 bytes)
2069.986834289989

julia> @btime arr_sum_bc($X)
  7.791 μs (0 allocations: 0 bytes)
2069.986834289989

julia> @btime arr_sum_bc($XO)
  5.022 μs (0 allocations: 0 bytes)
2069.986834289989

Despite the fact that an isolated getindex is slower, in the context of a loop the slow parts are done just once so the performance does not suffer.

Bad inlining could nix this, but since we've annotated with @inline I doubt that will be a problem for most uses.

I don't understand why OffsetArrays are faster than Arrays, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants