Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for deterministic sparse matrix vector multiplication #1607

Open
renatobellotti opened this issue Sep 30, 2022 · 2 comments
Open

Option for deterministic sparse matrix vector multiplication #1607

renatobellotti opened this issue Sep 30, 2022 · 2 comments
Labels
cuda libraries Stuff about CUDA library wrappers. enhancement New feature or request good first issue Good for newcomers

Comments

@renatobellotti
Copy link

I would like to request deterministic sparse matrix vector multiplications. Essentially, the following code should always return the same result:

using CUDA
using Random

Random.seed!(546784)

A = cu(sprand(Float64, 1000, 1000, 0.6))
x = cu(rand(Float64, 1000))

y1 = A * x
y2 = A * x

maximum(abs.(y2 - y1))

A similar issue is #938.

What I found out so far

We need to use CUSPARSE_SPMV_COO_ALG2 or CUSPARSE_SPMV_CSR_ALG2. This will probably be related to changing the argument algo of mv!():

function mv!(transa::SparseChar, alpha::Number, A::Union{CuSparseMatrixBSR{TA},CuSparseMatrixCSR{TA}},
X::DenseCuVector{T}, beta::Number, Y::DenseCuVector{T}, index::SparseChar, algo::cusparseSpMVAlg_t=CUSPARSE_MV_ALG_DEFAULT) where {TA, T}

Unfortunately, the algorithm information seems to get lost in the wrapper function:

function mv_wrapper(transa::SparseChar, alpha::Number, A::CuSparseMatrix, X::DenseCuVector{T},
beta::Number, Y::CuVector{T}) where {T}
mv!(transa, alpha, A, X, beta, Y, 'O')
end

The wrapper seems to be used for implementing the operators:

@eval begin
function LinearAlgebra.mul!(C::CuVector{T}, A::$TypeA, B::DenseCuVector{T},
alpha::Number, beta::Number) where {T <: Union{Float16, ComplexF16, BlasFloat}}
mv_wrapper($transa(T), alpha, $(untaga(unwrapa(:A))), B, beta, C)
end
function LinearAlgebra.mul!(C::CuVector{Complex{T}}, A::$TypeA, B::DenseCuVector{Complex{T}},
alpha::Number, beta::Number) where {T <: Union{Float16, BlasFloat}}
mv_wrapper($transa(T), alpha, $(untaga(unwrapa(:A))), B, beta, C)
end
end
for (tagb, untagb) in tag_wrappers, (wrapb, transb, unwrapb) in op_wrappers
TypeB = wrapb(tagb(:(DenseCuMatrix{T})))
@eval begin
function LinearAlgebra.mul!(C::CuMatrix{T}, A::$TypeA, B::$TypeB,
alpha::Number, beta::Number) where {T <: Union{Float16, ComplexF16, BlasFloat}}
mm_wrapper($transa(T), $transb(T), alpha, $(untaga(unwrapa(:A))), $(untagb(unwrapb(:B))), beta, C)
end
end
end
end

The remaining question for the maintainers is now: Where do we pass in the algorithm?

@renatobellotti renatobellotti added the bug Something isn't working label Sep 30, 2022
@maleadt
Copy link
Member

maleadt commented Oct 3, 2022

You need to make the alg choice depend on the math mode, e.g.,

algo::cudnnSoftmaxAlgorithm_t = (CUDA.math_mode()===CUDA.FAST_MATH ? CUDNN_SOFTMAX_FAST : CUDNN_SOFTMAX_ACCURATE),

Although I'd argue that here the deterministic algorithm only needs to be used when the math mode is PEDANTIC_MATH.

@maleadt maleadt added enhancement New feature or request good first issue Good for newcomers cuda libraries Stuff about CUDA library wrappers. and removed bug Something isn't working labels Oct 3, 2022
@renatobellotti
Copy link
Author

renatobellotti commented Jul 11, 2023

I have managed to write a function that seems to perform the sparse matrix-vector product on the GPU in a bitwise reproducible way. I'm leaving this snippet here in case somebody is interested.

using CUDA
using Random
using SparseArrays

Random.seed!(546784)

function reproducible_mv(A, x)
    @assert eltype(A) == eltype(x)
    Y = cu(zeros(eltype(A), size(A, 1)))
    CUDA.CUSPARSE.mv!('N', 1, A, x, 0, Y, 'O', CUDA.CUSPARSE.CUSPARSE_SPMV_CSR_ALG2)
    return Y
end


A = sprand(Float32, 1000, 1000, 0.6)
x = rand(Float32, 1000)

# Default calculation: NOT reproducible.
d_A = cu(A)
d_x = cu(x)

y1 = collect(d_A * d_x)
y2 = collect(d_A * d_x)

println(maximum(abs.(y2 - y1)))

# New calculation: Bitwise reproducible.
d_A = CUDA.CUSPARSE.CuSparseMatrixCSR(A)

y1 = collect(reproducible_mv(d_A, d_x))
y2 = collect(reproducible_mv(d_A, d_x))

println(y1 == y2)

This seems to work only if the matrix is in CSR format!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers. enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants