- Denny Lee, Principal Program Manager, CosmosDB
- Tom Drabas, Data Scientist, WDG
- Sergey Ermolin, Power/Performance Optimization
- Ding Ding, Software Engineer
- Jiao Wang, Software Engineer
- Jason Dai, Senior Principle Engineer and CTO, Big Data Technologies
- Yiheng, Wang, Software Engineer
- Xianyan Jia, Software Engineer
- Felix Cheung, Principal Software Engineer
- Xiaoyong Zhu, Program Manager
- Alejandro Guerrero Gonzalez, Senior Software Engineer
The folders in this repo:
- data folder - contains a set of 4 files that can be downloaded from http://yann.lecun.com/exdb/mnist/:
- train-images-idx3-ubyte - set of training images in a binary format with a specific schema (we'll get to that)
- train-labels-idx1-ubyte - corresponding set of training labels
- t10k-images-idx3-ubyte - set of testing (validation) images
- t10k-labels-idx1-ubyte - corresponding set of testing (validation) labels
- jars folder - contains two compiled jars for the BigDL:
- bigdl-0.2.0-SNAPSHOT-spark-2.0-jar-with-dependencies.jar - BigDL compiled for Spark 2.0
- bigdl-0.2.0-SNAPSHOT-spark-2.1-jar-with-dependencies.jar - BigDL compiled for Spark 2.1
- notebook folder - contains the notebook for the workshop
Grab the jar from the jars folder appropriate for your version of Spark.
- Go to Azure Dashboard and click on your cluster. Scroll down to the Storage accounts
- Click on the default storage account
- Go to Blobs
- Select the default container
- Upload the jar appropriate for your version of Spark to the root of the folder
- Check if uploaded successfully
Similarly to uploading the BigDL upload the data from the data folder. Upload the data into the /tmp
folder in your default storage.