- Groups up to 3 people
- One week of time
- The aim is to implement an efficient 2D-convolution algorithm in CUDA.
- The size of the mask should be parametric.
- Show the differences between the implementation with and without tiling.
- Analysis of different implementations with different tiling size: optimize the performance given a specific Colab GPU.
- Submit a google Colab file (.pynb) where you show your finding.
- Submitting a file other than .pynb is possible, but it requires prior discussion with prof.
- Provide a short report (max 2 pages) where you present your finding:
- Experimental setup
- Performance measurements
- Explanation of design choices
- No screenshots of the code!