- spnn: simple parallelized neural network.
- A comparison of fully connected network (forward and backward propagation) implementations.
- Implementations are listed below,
- CPU single thread.
- CPU multiple threads using openmp.
- GPU single thread using cuda.
- GPU multiple threads using cuda.
- OpenBLAS.
- The task selected is digit classification on MNIST data.
- Code is written in
C++/CUDA
. - OpenMP variant uses
openmp
library. - OpenBLAS variant uses
openblas
library. include/
contains headers.src/
contains all variant implementations.data/
contains MNIST data.proposal.pdf
contains the project proposal.presentation.pdf
contains the presentation given at the end of the project.report.pdf
contains details, experiments and analysis.Makefile
is used to make target executables.
- The documentation of the code is itself.
- Open a terminal in the directory containing
Makefile
. - Use
make all
to build all targets. - The targets are listed as follows,
cpu_serial.out
cuda_parallel.out
openmp.out
openblas.out
cuda_serial.out
- To build a specific target use
make <target-name>
. - To remove all targets use
make clean
. - Use
./<target-name>
to run a target.
- Accuracy vs epochs for the fully connected network irrepective of implementation.
- Implementations comparision for a specfic model.
- Time taken vs Params size for different implementaions. Observe the GPU parallelized variant curve is flat at almost 0.
- Things to consider during analysis
- correctness (> 10% accuracy)
- repeatablity (nothing fancy)
- memory check (no mem leaks or other bad stuff using valgrind --tool=memcheck)
- Initialization done uniformly in -1 to 1
- Layers are numbered from 0 i.e. first hidden layer is layer 1
- Control size of name field
- Impl loss function
- Remove memleaks from step_train
- Batch gradient descent: fix loss decrement and check backprop
- Normalize
- Get MNIST data
- Profile
- Remove data loading from time taken