Offload the Eigen3 matrix-matrix multiplication to an Nvidia GPU using CUBLAS.
Changed
- Split the memory management (
CudaMatrix
) from the CUBLAS invocation (CudaPipeline
)
- Moved all the allocation to the smart pointers inside
CudaMatrix
- Removed unused headers