You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We get (1, 1, 1) on the first print but (1,2,3) on the second. The matrix a is constructed by the rows but inside a kernel it's constructed by the columns. It would be better if they run by the same rule.
Warp 1.3.0 initialized:
CUDA Toolkit 12.5, Driver 12.2
Devices:
"cpu" : "Intel64 Family 6 Model 151 Stepping 2, GenuineIntel"
"cuda:0" : "NVIDIA GeForce RTX 3060" (12 GiB, sm_86, mempool enabled)
[1.0, 1.0, 1.0]
Module __main__ c767a68 load on device 'cuda:0' took 0.96 ms (cached)
1 2 3
System Information
No response
The text was updated successfully, but these errors were encountered:
Bug Description
It seems we get different matrices depending on whether the code is inside a kernel if we construct the matrix by vectors.
We get (1, 1, 1) on the first print but (1,2,3) on the second. The matrix
a
is constructed by the rows but inside a kernel it's constructed by the columns. It would be better if they run by the same rule.System Information
No response
The text was updated successfully, but these errors were encountered: