diff --git a/CHANGELOG.md b/CHANGELOG.md index c815b65..e816382 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,7 +2,7 @@ # [0.4.0] 10/02/2020 ### Changed - - Split the memory managment (`CudaMatrix`) from the cublas invocation (`CudaPipeline`) + - Split the memory management (`CudaMatrix`) from the CUBLAS invocation (`CudaPipeline`) - Moved all the allocation to the smart pointers inside `CudaMatrix` - Removed unused headers @@ -21,12 +21,12 @@ # [0.2.0] 27/08/2019 ### Added - - Tensor matrix multiplacation using [gemmbatched](https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gemmbatched). + - Tensor matrix multiplacation using [gemmbatched](https://docs.nvidia.com/cuda/CUBLAS/index.html#CUBLAS-lt-t-gt-gemmbatched). - [Async calls](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g85073372f776b4c4d5f89f7124b7bf79) to memory copies. - Properly free memory after the tensor operation is done. # [0.1.0] ### New - - Use a template function to perform matrix matrix multiplacation using [cublas](https://docs.nvidia.com/cuda/cublas/index.html). + - Use a template function to perform matrix matrix multiplacation using [CUBLAS](https://docs.nvidia.com/cuda/CUBLAS/index.html). - Use either *pinned* (**default**) or *pageable* memory, see [cuda optimizations](https://devblogs.nvidia.com/how-optimize-data-transfers-cuda-cc/). diff --git a/README.md b/README.md index 3401e3e..9627522 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # EigenCuda -Offload the [Eigen3](http://eigen.tuxfamily.org/index.php?title=Main_Page) matrix matrix multiplacation to an Nvidia GPU +Offload the [Eigen3](http://eigen.tuxfamily.org/index.php?title=Main_Page) matrix matrix multiplication to an Nvidia GPU using [cublas](https://docs.nvidia.com/cuda/cublas/index.html). ## CMake Installation