Challenge 2 Challenge rules Groups up to 3 people One week of time The aim is to implement an efficient 2D-convolution algorithm in CUDA. The size of the mask should be parametric. Show the differences between the implementation with and without tiling. Analysis of different implementations with different tiling size: optimize the performance given a specific Colab GPU. Submit a google Colab file (.pynb) where you show your finding. Submitting a file other than .pynb is possible, but it requires prior discussion with prof. Provide a short report (max 2 pages) where you present your finding: Experimental setup Performance measurements Explanation of design choices No screenshots of the code!