All tests must point to a Docker Image that encapsulates the environment in which you want that test to run.
Your image's cloudbuild.yaml
contains the push location of your image. This is what you will refer to in your test config when setting the image
for a test. By default, our Cloud Build configs will push to GCR under the project specified by the --project
flag (or your default gcloud
project if none is provided). For example, to build our TensorFlow Model Garden image, run the following command:
gcloud builds submit --config images/tensorflow/cloudbuild.yaml
Once the build completes, you can find the image at [gcr.io/$PROJECT_NAME/], where $PROJECT_NAME
is the unique name of your Google Cloud Platform project.
If you don't have an existing CI/CD workflow, we recommend using Cloud Build and Cloud Scheduler to set up an automatic build cycle.
We currently provide the following pre-made Docker build configurations:
- TF Model Garden: Use this image for Tensorflow officially-supported models running on TPU or GPU.
- PyTorch on TPU: Use this image for PyTorch models on TPUs.
- PyTorch Examples GPU: An example image based on the PyTorch Examples repo. Supports running on GPUs.
Follow the pattern shown in pytorch/
, tensorflow/
, or pytorch-examples-gpu/
to create a Dockerfile
, a cloudbuild.yaml
, and an entrypoint.sh
.
Your entry needs to call source /publish.sh
if you want to collect training metrics and/or send alerts for performance regressions.
Some more tips:
- The example image under
pytorch/
shows the process of pulling in an image built elsewhere. In this case, it pulls the image generated by the pytorch/xla repo. - If you want to run on GPUs, we recommend using
tensorflow/tensorflow:nightly-gpu-py3
as the base image as shown intensorflow/
.- You can also use
nvida/cuda
.
- You can also use
- Once you've made your
Dockerfile
,cloudbuild.yaml
, andentrypoint.sh
files, use this command to test building your image:gcloud builds submit --config images/tensorflow/cloudbuild.yaml
.- NOTE: This will upload the image to your Google Cloud Platform project under Container Registry > Images.
Once your image is able to build, you'll probaby want to automated the process of creating it rather than repeatedly manually running gcloud builds submit
.
To do this, we will build a Cloud Build Trigger plus a Cloud Scheduler.
Here is how you would create a trigger that fires whenever you update images/pytorch-examples-gpu
on your repo's master branch. This trigger would also fire if you updated the shared image files.
Once your code is checked into your Cloud Source Repository under images/
(see top-level README on how to set up the Cloud Source Repo), navigate to the Triggers page and click "Create Trigger".
- Under "Branch", use
^master$
as the regex - Under "Included files filter (glob)", use:
images/pytorch-examples-gpu/**
images/common.sh
images/publish.sh
- Under "Build configuration", choose "Cloud Build configuration file (yaml or json)" and change the path to
/images/pytorch-examples-gpu/cloudbuild.yaml
- Under "Substitution variables", add one key:value pair where "Variable" =
_VERSION
and "Value" =master
.
You can already run your trigger manually as a one-off and it will fire automatically if you push a change to the master branch of your repository.
If you want to set your trigger to run on an automated schedule (e.g. if you wanted to build a new image daily since your image tracks a repo under active development), you can set up a Cloud Scheduler to kick off the trigger.
Navigate to the Cloud Schedulers page and click "Create Job".
- Protip: the "Timezone" search box is a little counterintuitive - first search by country, e.g.
United States
. - Use
HTTP
for the "Target" field. - For the "URL", use your trigger, e.g.
https://cloudbuild.googleapis.com/v1/projects/xl-ml-test/triggers/pytorch-examples-gpu-master:run
.- The format is
https://cloudbuild.googleapis.com/v1/projects/your-project-name/triggers/your-trigger-name:run
.
- The format is
- Use
POST
for the "HTTP method". - For "Body", use:
{"branchName": "master", "substitutions": {"_VERSION": "nightly"}}
. - Click "Show More" to add a few more things:
- "Auth header":
Add OAuth token
. - "Service account": Find the email of the "Compute Engine default service account":
- Go to the IAM page for your project.
- Find the row with "Name" = "Compute Engine default service account".
- Copy the email address (i.e. the "Member" column).
- The Member field will be of the form:
1234567890123-compute@developer.gserviceaccount.com
.
- The Member field will be of the form:
- Use this address for the "Service account" field of your Cloud Scheduler.
- "Scope":
https://www.googleapis.com/auth/cloud-platform
.
- "Auth header":