This scenario involves training a CNN using the MNIST dataset. It involves one training data provider (TDP), and a TDC who wishes the train a model.
The end-to-end training pipeline consists of the following phases.
- Data pre-processing and de-identification
- Data packaging, encryption and upload
- Model packaging, encryption and upload
- Encryption key import with key release policies
- Deployment and execution of CCR
- Model decryption
Build container images required for this sample as follows.
cd scenarios/mnist
./ci/build.sh
./ci/push-containers.sh
These scripts build the following containers and push them to the container registry set in $CONTAINER_REGISTRY.
depa-mnist-preprocess
: Container for pre-processing MNIST dataset.depa-mnist-save-model
: Container that saves the model to be trained in ONNX format.
The folders scenarios/mnist/data
contains scripts for downloading and pre-processing the MNIST dataset. Acting as a TDP for this dataset, run the following script.
cd scenarios/mnist/deployment/docker
./preprocess.sh
Next, acting as a TDC, save a sample model using the following script.
./save-model.sh
This script will save the model as scenarios/mnist/data/model/model.onnx.
Assuming you have cleartext access to the pre-processed dataset, you can train a CNN as follows.
./train.sh
The script trains a model using a pipeline configuration defined in pipeline_config.json. If all goes well, you should see output similar to the following output, and the trained model will be saved under the folder /tmp/output
.
docker-train-1 | /usr/local/lib/python3.9/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libc10_cuda.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
docker-train-1 | warn(
docker-train-1 | /usr/local/lib/python3.9/dist-packages/onnx2pytorch/convert/layer.py:30: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
docker-train-1 | layer.weight.data = torch.from_numpy(numpy_helper.to_array(weight))
docker-train-1 | /usr/local/lib/python3.9/dist-packages/onnx2pytorch/convert/model.py:147: UserWarning: Using experimental implementation that allows 'batch_size > 1'.Batchnorm layers could potentially produce false outputs.
docker-train-1 | warnings.warn(
docker-train-1 | [1, 2000] loss: 2.242
docker-train-1 | [1, 4000] loss: 1.972
docker-train-1 | [1, 6000] loss: 1.799
docker-train-1 | [1, 8000] loss: 1.695
docker-train-1 | [1, 10000] loss: 1.642
docker-train-1 | [1, 12000] loss: 1.581
docker-train-1 | [1, 14000] loss: 1.545
docker-train-1 | [1, 16000] loss: 1.502
docker-train-1 | [1, 18000] loss: 1.520
docker-train-1 | [1, 20000] loss: 1.471
docker-train-1 | [1, 22000] loss: 1.438
docker-train-1 | [1, 24000] loss: 1.435
docker-train-1 | [2, 2000] loss: 1.402
docker-train-1 | [2, 4000] loss: 1.358
docker-train-1 | [2, 6000] loss: 1.379
docker-train-1 | [2, 8000] loss: 1.355
...
In a more realistic scenario, this datasets will not be available in the clear to the TDC, and the TDC will be required to use a CCR for training. The following steps describe the process of sharing an encrypted dataset with TDCs and setting up a CCR in Azure for training. Please stay tuned for CCR on other cloud platforms.
To deploy in Azure, you will need the following.
- Docker Hub account to store container images. Alternatively, you can use pre-built images from the
ispirt
container registry. - Azure Key Vault to store encryption keys and implement secure key release to CCR. You can either you Azure Key Vault Premium (lower cost), or Azure Key Vault managed HSM for enhanced security. Please see instructions below on how to create and setup your AKV instance.
- Valid Azure subscription with sufficient access to create key vault, storage accounts, storage containers, and Azure Container Instances.
If you are using your own development environment instead of a dev container or codespaces, you will to install the following dependencies.
- Azure CLI.
- Azure CLI Confidential containers extension. After installing Azure CLI, you can install this extension using
az extension add --name confcom -y
- Go. Follow the instructions to install Go. After installing, ensure that the PATH environment variable is set to include
go
runtime. jq
. You can install jq usingsudo apt-get install -y jq
We will be creating the following resources as part of the deployment.
- Azure Key Vault
- Azure Storage account
- Storage containers to host encrypted datasets
- Azure Container Instances to deploy the CCR and train the model
If you wish to use your own container images, login to docker hub and push containers to your container registry.
Note: Replace
<docker-hub-registry-name>
the name of your docker hub registry name.
export CONTAINER_REGISTRY=<docker-hub-registry-name>
docker login
./ci/push-containers.sh
cd scenarios/mnist
./ci/push-containers.sh
Acting as the TDP, we will create a resource group, a key vault instance and storage containers to host the encrypted MNIST training dataset and encryption keys. In a real deployments, TDPs and TDCs will use their own key vault instance. However, for this sample, we will use one key vault instance to store keys for all datasets and models.
Note: At this point, automated creation of AKV managed HSMs is not supported.
Note: Replace
<resource-group-name>
and<key-vault-endpoint>
with names of your choice. Storage account names must not container any special characters. Key vault endpoints are of the form<key-vault-name>.vault.azure.net
(for Azure Key Vault Premium) and<key-vault-name>.managedhsm.azure.net
for AKV managed HSM, with no leading https. This endpoint must be the same endpoint you used while creating the contract.
az login
export AZURE_RESOURCE_GROUP=<resource-group-name>
export AZURE_KEYVAULT_ENDPOINT=<key-vault-endpoint>
export AZURE_STORAGE_ACCOUNT_NAME=<unique-storage-account-name>
export AZURE_MNIST_CONTAINER_NAME=mnistdatacontainer
export AZURE_MODEL_CONTAINER_NAME=mnistmodelcontainer
export AZURE_OUTPUT_CONTAINER_NAME=mnistoutputcontainer
cd scenarios/mnist/data
./1-create-storage-containers.sh
./2-create-akv.sh
Next, follow instructions here to sign and register a contract with the contract service. You can either deploy your own contract service or use a test contract service hosted at https://contract-service.westeurope.cloudapp.azure.com:8000
. The registered contract must contain references to the datasets with matching names, keyIDs and Azure Key Vault endpoints used in this sample. A sample contract template for this scenario is provided here. After updating, signing and registering the contract, retain the contract service URL and sequence number of the contract for the rest of this sample.
Next, use the following script to generate and import encryption keys into Azure Key Vault with a policy based on policy-in-template.json. The policy requires that the CCRs run specific containers with a specific configuration which includes the public identity of the contract service. Only CCRs that satisfy this policy will be granted access to the encryption keys.
Note: Replace
<repo-root>
with the path to and including thedepa-training
folder where the repository was cloned.
export CONTRACT_SERVICE_URL=<contract-service-url>
export TOOLS_HOME=<repo-root>/external/confidential-sidecar-containers/tools
./3-import-keys.sh
The generated keys are available as files with the extension .bin
.
Next, encrypt the dataset and models using keys generated in the previous step.
cd scenarios/mnist/data
./4-encrypt-data.sh
This step will generate three encrypted file system images (with extension .img
), one for the dataset, one encrypted file system image containing the model, and one image where the trained model will be stored.
Now upload encrypted datasets to Azure storage containers.
./5-upload-encrypted-data.sh
Acting as a TDC, use the following script to deploy the CCR using Confidential Containers on Azure Container Instances.
Note: Replace
<contract-sequence-number>
with the sequence number of the contract registered with the contract service.
cd scenarios/mnist/deployment/aci
./deploy.sh -c <contract-sequence-number> -m ../../config/model_config.json -q ../../config/query_config.json
This script will deploy the container images from your container registry, including the encrypted filesystem sidecar. The sidecar will generate an SEV-SNP attestation report, generate an attestation token using the Microsoft Azure Attestation (MAA) service, retrieve dataset, model and output encryption keys from the TDP and TDC's Azure Key Vault, train the model, and save the resulting model into TDC's output filesystem image, which the TDC can later decrypt.
Once the deployment is complete, you can obtain logs from the CCR using the following commands. Note there may be some delay in getting the logs are deployment is complete.
# Obtain logs from the training container
az container logs --name depa-training-mnist --resource-group $AZURE_RESOURCE_GROUP --container-name depa-training
# Obtain logs from the encrypted filesystem sidecar
az container logs --name depa-training-mnist --resource-group $AZURE_RESOURCE_GROUP --container-name encrypted-storage-sidecar
You can download and decrypt the trained model using the following script.
cd scenarios/mnist/data
./6-download-decrypt-model.sh
The trained model is available in output
folder.
You can use the following command to delete the resource group and clean-up all resources used in the demo.
az group delete --yes --name $AZURE_RESOURCE_GROUP