Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to calculate gradients on GPU with Embedding + OneHot Tensors (dim >= 3) #31

Open
reachtarunhere opened this issue Jan 11, 2023 · 4 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@reachtarunhere
Copy link

Simple Example to Replicate:

using Flux
using Flux: onehotbatch


model = Embedding(26, 5) |> gpu
inputs = rand('a':'z', (2, 5))  # NOTE THIS IS 2D SO AFTER CONVERTING TO OH IT WOULD BE 3D

loss(y) = sum(y)

inputs_oh = onehotbatch(inputs, 'a':'z') |> gpu

model(inputs_oh)
loss(model(inputs_oh))

opt = Flux.Optimise.Descent(0.1)
opt_state = Flux.setup(opt, model)



l, grads = Flux.withgradient(m -> loss(m(inputs_oh)), model) # ERROR HAPPENS HERE
Flux.update!(opt_state, model, grads[1])
loss(model(inputs_oh))

Error Log:

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:87
  [3] getindex(::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::Int64, ::Int64)
    @ GPUArrays ~/.julia/packages/GPUArrays/fqD8z/src/host/indexing.jl:9
  [4] _generic_matmatmul!(C::Matrix{Float32}, tA::Char, tB::Char, A::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::Base.ReshapedArray{Bool, 2, OneHotArrays.OneHotArray{UInt32, 2, 3, CUDA.CuArray{UInt32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ LinearAlgebra /opt/julias/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:830
  [5] generic_matmatmul!(C::Matrix{Float32}, tA::Char, tB::Char, A::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::Base.ReshapedArray{Bool, 2, OneHotArrays.OneHotArray{UInt32, 2, 3, CUDA.CuArray{UInt32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ LinearAlgebra /opt/julias/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:798
  [6] mul!
    @ /opt/julias/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:478 [inlined]
  [7] mul!
    @ /opt/julias/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:275 [inlined]
  [8] *
    @ /opt/julias/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:153 [inlined]
  [9] FluxML/Flux.jl#1471
    @ ~/.julia/packages/ChainRules/RZYEu/src/rulesets/Base/arraymath.jl:36 [inlined]
 [10] unthunk
    @ ~/.julia/packages/ChainRulesCore/C73ay/src/tangent_types/thunks.jl:204 [inlined]
 [11] wrap_chainrules_output
    @ ~/.julia/packages/Zygote/AS0Go/src/compiler/chainrules.jl:105 [inlined]
 [12] map
    @ ./tuple.jl:223 [inlined]
 [13] wrap_chainrules_output
    @ ~/.julia/packages/Zygote/AS0Go/src/compiler/chainrules.jl:106 [inlined]
 [14] ZBack
    @ ~/.julia/packages/Zygote/AS0Go/src/compiler/chainrules.jl:206 [inlined]
 [15] Pullback
    @ ~/.julia/packages/Flux/v79Am/src/layers/basic.jl:701 [inlined]
 [16] Pullback
    @ ~/.julia/packages/Flux/v79Am/src/layers/basic.jl:702 [inlined]
 [17] (::typeof(∂(λ)))(Δ::CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/AS0Go/src/compiler/interface2.jl:0
 [18] Pullback
    @ ~/Projects/MakeMore/src/bugrep.jl:23 [inlined]
 [19] (::Zygote.var"#60#61"{typeof(∂(#1))})(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/AS0Go/src/compiler/interface.jl:45
 [20] withgradient(f::Function, args::Flux.Embedding{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}})
    @ Zygote ~/.julia/packages/Zygote/AS0Go/src/compiler/interface.jl:133
 [21] top-level scope
    @ ~/Projects/MakeMore/src/bugrep.jl:23

Work Around:

By replacing inputs_oh above with a Integer Tensor where characters have been mapped to 1:26 makes sure everything runs with no errors on GPU. To test try below:

# inputs_oh = onehotbatch(inputs, 'a':'z') |> gpu
inputs_oh = rand(1:26, (2, 5)) |> gpu

Useful Background Information:

  • Flux v0.13.11
  • OneHotArrays v0.2.3
  • Julia Info
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_ERROR_COLOR = red
  • CUDA Info
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
@mcabbott mcabbott added the bug Something isn't working label Jan 12, 2023
@mcabbott
Copy link
Member

mcabbott commented Jan 12, 2023

The file ~/.julia/packages/ChainRules/RZYEu/src/rulesets/Base/arraymath.jl:36 [inlined] is this:
https://github.com/JuliaDiff/ChainRules.jl/blob/13ccc862899d8a3d98b09bd68edd9be8ca28197e/src/rulesets/Base/arraymath.jl#L36

Here's a smaller demonstration of the problem:

julia> using CUDA, OneHotArrays, NNlibCUDA

julia> CUDA.allowscalar(false)

julia> x = cu(onehotbatch([3, 4],1:5))
5×2 OneHotMatrix(::CuArray{UInt32, 1, CUDA.Mem.DeviceBuffer}) with eltype Bool:
   
   
 1  
   1
   

julia> cu(ones(3,5)) * x
3×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 1.0  1.0
 1.0  1.0
 1.0  1.0

julia> cu(ones(3,2)) * x'
3×5 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.0  0.0  1.0  1.0  0.0
 0.0  0.0  1.0  1.0  0.0
 0.0  0.0  1.0  1.0  0.0

julia> @which cu(ones(3,2)) * x'
*(A::AbstractMatrix, B::LinearAlgebra.Adjoint{Bool, <:OneHotMatrix})
     @ OneHotArrays ~/.julia/packages/OneHotArrays/T3yiq/src/linalg.jl:13

julia> y = reshape(cu(onehotbatch([3 4],1:5)), 5, 2)
5×2 reshape(OneHotArray(::CuArray{UInt32, 2, CUDA.Mem.DeviceBuffer}), 5, 2) with eltype Bool:
   
   
 1  
   1
   

julia> y isa OneHotLike
true

julia> cu(ones(3,5)) * y
3×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 1.0  1.0
 1.0  1.0
 1.0  1.0

julia> cu(ones(3,2)) * y'
ERROR: Scalar indexing is disallowed.
...
  [4] _generic_matmatmul!(C::Matrix{Float32}, tA::Char, tB::Char, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::Base.ReshapedArray{Bool, 2, OneHotArrays.OneHotArray{UInt32, 2, 3, CuArray{UInt32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})

julia> @which cu(ones(3,2)) * y'
*(A::AbstractMatrix, B::AbstractMatrix)
     @ LinearAlgebra ~/julia-9ded051e9f/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:139

julia> typeof(y')
LinearAlgebra.Adjoint{Bool, Base.ReshapedArray{Bool, 2, OneHotArray{UInt32, 2, 3, CuArray{UInt32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}}

Xref recent #30 about Flux.OneHotLike.

@mcabbott mcabbott transferred this issue from FluxML/Flux.jl Jan 12, 2023
@mcabbott
Copy link
Member

This method:

function Base.:(*)(A::AbstractMatrix, B::Adjoint{Bool, <:OneHotMatrix})
B_dim = length(_indices(parent(B)))
size(A, 2) == B_dim || throw(DimensionMismatch("Matrix column must correspond with OneHot size: $(size(A, 2)) != $B_dim"))
return NNlib.scatter(+, A, _indices(parent(B)), dstsize=(size(A,1), size(B,2)))
end

ought to allow for B::Adjoint{Bool, <:OneHotLike} (and B::Transpose too) by checking _isonehot(B) etc, as the ones above it do.

@mcabbott mcabbott added good first issue Good for newcomers help wanted Extra attention is needed labels Jan 14, 2023
@reachtarunhere
Copy link
Author

reachtarunhere commented Jan 14, 2023

@mcabbott I would like to take a stab at this if it's okay

@mcabbott
Copy link
Member

That would be great. There's no formal process of assigning things, although we try a little to avoid PR races.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants