Option for deterministic sparse matrix vector multiplication #1607

renatobellotti · 2022-09-30T14:54:51Z

I would like to request deterministic sparse matrix vector multiplications. Essentially, the following code should always return the same result:

using CUDA
using Random

Random.seed!(546784)

A = cu(sprand(Float64, 1000, 1000, 0.6))
x = cu(rand(Float64, 1000))

y1 = A * x
y2 = A * x

maximum(abs.(y2 - y1))

A similar issue is #938.

What I found out so far

We need to use CUSPARSE_SPMV_COO_ALG2 or CUSPARSE_SPMV_CSR_ALG2. This will probably be related to changing the argument algo of mv!():

CUDA.jl/lib/cusparse/generic.jl

Lines 112 to 113 in 33a7187

    
           function mv!(transa::SparseChar, alpha::Number, A::Union{CuSparseMatrixBSR{TA},CuSparseMatrixCSR{TA}}, 
        
                        X::DenseCuVector{T}, beta::Number, Y::DenseCuVector{T}, index::SparseChar, algo::cusparseSpMVAlg_t=CUSPARSE_MV_ALG_DEFAULT) where {TA, T}

Unfortunately, the algorithm information seems to get lost in the wrapper function:

CUDA.jl/lib/cusparse/interfaces.jl

Lines 6 to 9 in 99f962a

    
           function mv_wrapper(transa::SparseChar, alpha::Number, A::CuSparseMatrix, X::DenseCuVector{T}, 
        
                               beta::Number, Y::CuVector{T}) where {T} 
        
               mv!(transa, alpha, A, X, beta, Y, 'O') 
        
           end

The wrapper seems to be used for implementing the operators:

CUDA.jl/lib/cusparse/interfaces.jl

Lines 38 to 60 in 99f962a

    
               @eval begin 
        
                   function LinearAlgebra.mul!(C::CuVector{T}, A::$TypeA, B::DenseCuVector{T}, 
        
                                               alpha::Number, beta::Number) where {T <: Union{Float16, ComplexF16, BlasFloat}} 
        
                       mv_wrapper($transa(T), alpha, $(untaga(unwrapa(:A))), B, beta, C) 
        
                   end 
        
                   function LinearAlgebra.mul!(C::CuVector{Complex{T}}, A::$TypeA, B::DenseCuVector{Complex{T}}, 
        
                                               alpha::Number, beta::Number) where {T <: Union{Float16, BlasFloat}} 
        
                       mv_wrapper($transa(T), alpha, $(untaga(unwrapa(:A))), B, beta, C) 
        
                   end 
        
               end 
        
               for (tagb, untagb) in tag_wrappers, (wrapb, transb, unwrapb) in op_wrappers 
        
                   TypeB = wrapb(tagb(:(DenseCuMatrix{T}))) 
        
                   @eval begin 
        
                       function LinearAlgebra.mul!(C::CuMatrix{T}, A::$TypeA, B::$TypeB, 
        
                                                   alpha::Number, beta::Number) where {T <: Union{Float16, ComplexF16, BlasFloat}} 
        
                           mm_wrapper($transa(T), $transb(T), alpha, $(untaga(unwrapa(:A))), $(untagb(unwrapb(:B))), beta, C) 
        
                       end 
        
                   end 
        
               end 
        
           end

The remaining question for the maintainers is now: Where do we pass in the algorithm?

The text was updated successfully, but these errors were encountered:

maleadt · 2022-10-03T07:15:33Z

You need to make the alg choice depend on the math mode, e.g.,

CUDA.jl/lib/cudnn/softmax.jl

Line 24 in 39926cd

    
           algo::cudnnSoftmaxAlgorithm_t = (CUDA.math_mode()===CUDA.FAST_MATH ? CUDNN_SOFTMAX_FAST : CUDNN_SOFTMAX_ACCURATE),

Although I'd argue that here the deterministic algorithm only needs to be used when the math mode is PEDANTIC_MATH.

renatobellotti · 2023-07-11T08:48:34Z

I have managed to write a function that seems to perform the sparse matrix-vector product on the GPU in a bitwise reproducible way. I'm leaving this snippet here in case somebody is interested.

using CUDA
using Random
using SparseArrays

Random.seed!(546784)

function reproducible_mv(A, x)
    @assert eltype(A) == eltype(x)
    Y = cu(zeros(eltype(A), size(A, 1)))
    CUDA.CUSPARSE.mv!('N', 1, A, x, 0, Y, 'O', CUDA.CUSPARSE.CUSPARSE_SPMV_CSR_ALG2)
    return Y
end


A = sprand(Float32, 1000, 1000, 0.6)
x = rand(Float32, 1000)

# Default calculation: NOT reproducible.
d_A = cu(A)
d_x = cu(x)

y1 = collect(d_A * d_x)
y2 = collect(d_A * d_x)

println(maximum(abs.(y2 - y1)))

# New calculation: Bitwise reproducible.
d_A = CUDA.CUSPARSE.CuSparseMatrixCSR(A)

y1 = collect(reproducible_mv(d_A, d_x))
y2 = collect(reproducible_mv(d_A, d_x))

println(y1 == y2)

This seems to work only if the matrix is in CSR format!!!

renatobellotti added the bug Something isn't working label Sep 30, 2022

maleadt added enhancement New feature or request good first issue Good for newcomers cuda libraries Stuff about CUDA library wrappers. and removed bug Something isn't working labels Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option for deterministic sparse matrix vector multiplication #1607

Option for deterministic sparse matrix vector multiplication #1607

renatobellotti commented Sep 30, 2022

maleadt commented Oct 3, 2022

renatobellotti commented Jul 11, 2023 •

edited

Loading

Option for deterministic sparse matrix vector multiplication #1607

Option for deterministic sparse matrix vector multiplication #1607

Comments

renatobellotti commented Sep 30, 2022

What I found out so far

maleadt commented Oct 3, 2022

renatobellotti commented Jul 11, 2023 • edited Loading

renatobellotti commented Jul 11, 2023 •

edited

Loading