Features:
- python wrapping for SGD, Variable classes
- add MaxPool2D layer
- implement tensor reshaping
- use relative imports in the python package
- add better weight initialization methods
- improve numerical stability for some ops
- add MNSIT example
- add benchmark for critical layers
- use OpenMP for optional parallelization
- do not use ts::transpose when blas can do this for us
- add a method for accessing raw pointer to underling data
- use protobuf for model serialization
- compile project with -Wall -Wextra and fix all warnings