Challenge 2

Challenge rules

Groups up to 3 people
One week of time
The aim is to implement an efficient 2D-convolution algorithm in CUDA.
- The size of the mask should be parametric.
- Show the differences between the implementation with and without tiling.
- Analysis of different implementations with different tiling size: optimize the performance given a specific Colab GPU.
- Submit a google Colab file (.pynb) where you show your finding.
- Submitting a file other than .pynb is possible, but it requires prior discussion with prof.
- Provide a short report (max 2 pages) where you present your finding:
  - Experimental setup
  - Performance measurements
  - Explanation of design choices
  - No screenshots of the code!