This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" course (Spring 2020).
Stanford's CS231n is one of the best ways to dive into Deep Learning in general, in particular, into Computer Vision. If you plan to excel in another subfield of Deep Learning (say, Natural Language Processing or Reinforcement Learning), we still recommend that you start with CS231n, because it helps build intuition, fundamental understanding and hands-on skills. Beware, the course is very challenging!
To motivate you to work hard, here are actual applications that you'll implement in A3 - Style Transfer and Class Visualization.
For the one on the left, you take a base image and a style image and apply the "style" to the base image (reminds you of Prisma and Artisto, right?). The example on the right is a random image, gradually perturbed in a way that a neural network classifies it more and more confidently as a gorilla. DIY Deep Dream, isn't it? And it's all math under the hood, it's cool to figure out how it all works. You'll get to this understanding with CS231n, it'll be hard but at the same time an exciting journey from a simple kNN implementation to these fascinating applications. If you think that these two applications are eye-catchy, then take another look at the picture above - a Convolutional Neural Network classifying images. That's the basics of how machines can "see" the world. The course will teach you both how to build such an algorithm from scratch and how to use modern tools to run state-of-the-art models for your tasks.
Find course notes and assignments here and be sure to check out the video lectures for Winter 2016 and Spring 2017!
Assignments have been completed using both TensorFlow and PyTorch.
Q1: k-Nearest Neighbor Classifier
- Test accuracy on CIFAR-10: 0.282
Q2: Training a Support Vector Machine
- Test accuracy on CIFAR-10: 0.376
Q3: Implement a Softmax classifier
- Test accuracy on CIFAR-10: 0.355
- Test accuracy on CIFAR-10: 0.501
Q5: Higher Level Representations: Image Features
- Test accuracy on CIFAR-10: 0.576
Q1: Fully-connected Neural Network
- Validation / test accuracy on CIFAR-10: 0.547 / 0.539
Q3: Dropout
Q5: PyTorch / TensorFlow v2 on CIFAR-10 / TensorFlow v1 (Tweaked TFv1 model)
- Training / validation / test accuracy of TF implementation on CIFAR-10: 0.928 / 0.801 / 0.822
- PyTorch implementation:
Model | Training Accuracy | Test Accuracy |
---|---|---|
Base network | 92.86 | 88.90 |
VGG-16 | 99.98 | 93.16 |
VGG-19 | 99.98 | 93.24 |
ResNet-18 | 99.99 | 93.73 |
ResNet-101 | 99.99 | 93.76 |
Assignment #3: Image Captioning with Vanilla RNNs, Image Captioning with LSTMs, Network Visualization, Style Transfer, Generative Adversarial Networks
Q1: Image Captioning with Vanilla RNNs
Q2: Image Captioning with LSTMs
Q3: Network Visualization: Saliency maps, Class Visualization, and Fooling Images (PyTorch / TensorFlow v2 / TensorFlow v1)
Q4: Style Transfer (PyTorch / TensorFlow v2 / TensorFlow v1)
Q5: Generative Adversarial Networks (PyTorch / TensorFlow v2 / TensorFlow v1)
- My course notes
- Official course notes
- Reading material that I found to be useful for Assignment 2 and Assignment 3
For some parts of the 3rd assignment, you'll need GPUs. Kaggle Kernels or Google Colaboratory will do.
- The official course website
- Video-lectures. Prerequisites are given in the 1st lecture.
- Winter 2016 YouTube playlist
- Spring 2017 YouTube playlist
- Syllabus with assignments
- Lecture 1
- Lecture 2
- Lecture 3
- Lecture 4
- Lecture 5
- Lecture 6
- Lecture 7
- Lecture 8
- Lecture 9
- Lecture 10
- Lecture 11
- Lecture 12
- Lecture 13
- Lecture 14
- Lecture 15
- Lecture 16
I recognize the hard time people spend on building intuition, understanding new concepts and debugging assignments. The solutions uploaded here are only for reference. They are meant to unblock you if you get stuck somewhere. Please do not copy any part of the solutions as-is (the assignments are fairly easy if you read the instructions carefully).