-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seems OpenCL is faster than CUDA #239
Comments
@daylight0-0 Thank you and yes, OpenCL is about 5-15% faster in our own testing on the same hardware (RTX A5000). Newer versions should narrow the gap a little bit (due to requesting a smaller chunk of memory in Cuda similar to OpenCL based on the actual memory needed and not the maximums) - so if this isn't the current develop branch it may be worthwhile to test again. I suspect the remaining difference may be caused by pre-allocated memory at compile time (OpenCL) vs dynamically allocated memory at runtime (Cuda) for variables in shared memory - as other than this Cuda and OpenCL paths are using exactly the same algorithms and even implementations as much as possible ... Since OpenCL does exist on Nvidia and many more devices (all the way to Android) that's good news though ultimately :-) |
Found the culprit: It looks like I wrote that Cuda was faster about 3 years ago in our README.md. It probably was true at the time before I merged the integer gradient from Cuda to OpenCL as well. So I'll fix README.md by taking this sentence out. |
Thank you for your answer. I did use the develop branch though. |
For Cuda this should only be an issue if you were to compile with the wrong architecture(s) - for 3090/A5000/A6000. you want to compile with One more thing: I would only compare overall runtimes on the same machines as the kernel runtime performance timers while at the same location may still contain different tasks depending on what Cuda and OpenCL do at kernel cleanup time. |
I just realized, PR #233 should close the Cuda performance gap a bit more as it contains the code to allocate the same amount of memory as OpenCL ... |
I compared the time required by ligand and the time required by whole job, using Autodock-GPU built with CUDA and OpenCL respectively. And it always took CUDA about 30~40% more time than OpenCL. However, the description in the repository says CUDA is faster than OpenCL, contrary to my results. Similar results have always been obtained when trying under different conditions on the same system, and I have not tried on other systems. So I hope others to check if CUDA is really faster than OpenCL.
System:
Docking:
The text was updated successfully, but these errors were encountered: