Bug fix:
- Fix buffer issue in tensor
- Fix snapshot resume bug
- Remove useless libmklml_gnu.so file
- Fix PYPI package building sequence issue
New Feature:
- Support channel shuffle operator and refine fast_math to support gcc 4.8-
Others:
- Add Docker and Conda files