Docker Images

All tests must point to a Docker Image that encapsulates the environment in which you want that test to run.

Your image's cloudbuild.yaml contains the push location of your image. This is what you will refer to in your test config when setting the image for a test. By default, our Cloud Build configs will push to GCR under the project specified by the --project flag (or your default gcloud project if none is provided). For example, to build our TensorFlow Model Garden image, run the following command:

gcloud builds submit --config images/tensorflow/cloudbuild.yaml

Once the build completes, you can find the image at [gcr.io/$PROJECT_NAME/], where $PROJECT_NAME is the unique name of your Google Cloud Platform project.

If you don't have an existing CI/CD workflow, we recommend using Cloud Build and Cloud Scheduler to set up an automatic build cycle.

Step 1: Choose an existing image or make your own.

We currently provide the following pre-made Docker build configurations:

TF Model Garden: Use this image for Tensorflow officially-supported models running on TPU or GPU.
PyTorch on TPU: Use this image for PyTorch models on TPUs.
PyTorch Examples GPU: An example image based on the PyTorch Examples repo. Supports running on GPUs.

(Optional) Make your own image

Follow the pattern shown in pytorch/, tensorflow/, or pytorch-examples-gpu/ to create a Dockerfile, a cloudbuild.yaml, and an entrypoint.sh.

Your entry needs to call source /publish.sh if you want to collect training metrics and/or send alerts for performance regressions.

Some more tips:

The example image under pytorch/ shows the process of pulling in an image built elsewhere. In this case, it pulls the image generated by the pytorch/xla repo.
If you want to run on GPUs, we recommend using tensorflow/tensorflow:nightly-gpu-py3 as the base image as shown in tensorflow/.
- You can also use nvida/cuda.
Once you've made your Dockerfile, cloudbuild.yaml, and entrypoint.sh files, use this command to test building your image: gcloud builds submit --config images/tensorflow/cloudbuild.yaml.
- NOTE: This will upload the image to your Google Cloud Platform project under Container Registry > Images.

Step 2: Set up an automated trigger

Once your image is able to build, you'll probaby want to automated the process of creating it rather than repeatedly manually running gcloud builds submit.

To do this, we will build a Cloud Build Trigger plus a Cloud Scheduler.

Cloud Build Trigger

Here is how you would create a trigger that fires whenever you update images/pytorch-examples-gpu on your repo's master branch. This trigger would also fire if you updated the shared image files.

Once your code is checked into your Cloud Source Repository under images/ (see top-level README on how to set up the Cloud Source Repo), navigate to the Triggers page and click "Create Trigger".

Under "Branch", use ^master$ as the regex
Under "Included files filter (glob)", use:
- images/pytorch-examples-gpu/**
- images/common.sh
- images/publish.sh
Under "Build configuration", choose "Cloud Build configuration file (yaml or json)" and change the path to /images/pytorch-examples-gpu/cloudbuild.yaml
Under "Substitution variables", add one key:value pair where "Variable" = _VERSION and "Value" = master.

Cloud Scheduler

You can already run your trigger manually as a one-off and it will fire automatically if you push a change to the master branch of your repository.

If you want to set your trigger to run on an automated schedule (e.g. if you wanted to build a new image daily since your image tracks a repo under active development), you can set up a Cloud Scheduler to kick off the trigger.

Navigate to the Cloud Schedulers page and click "Create Job".

Protip: the "Timezone" search box is a little counterintuitive - first search by country, e.g. United States.
Use HTTP for the "Target" field.
For the "URL", use your trigger, e.g. https://cloudbuild.googleapis.com/v1/projects/xl-ml-test/triggers/pytorch-examples-gpu-master:run.
- The format is https://cloudbuild.googleapis.com/v1/projects/your-project-name/triggers/your-trigger-name:run.
Use POST for the "HTTP method".
For "Body", use: {"branchName": "master", "substitutions": {"_VERSION": "nightly"}}.
Click "Show More" to add a few more things:
- "Auth header": Add OAuth token.
- "Service account": Find the email of the "Compute Engine default service account":
  1. Go to the IAM page for your project.
  2. Find the row with "Name" = "Compute Engine default service account".
  3. Copy the email address (i.e. the "Member" column).
    - The Member field will be of the form: 1234567890123-compute@developer.gserviceaccount.com.
  4. Use this address for the "Service account" field of your Cloud Scheduler.
- "Scope": https://www.googleapis.com/auth/cloud-platform.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Docker Images

Step 1: Choose an existing image or make your own.

(Optional) Make your own image

Step 2: Set up an automated trigger

Cloud Build Trigger

Cloud Scheduler

Files

README.md

Latest commit

History

README.md

File metadata and controls

Docker Images

Step 1: Choose an existing image or make your own.

(Optional) Make your own image

Step 2: Set up an automated trigger

Cloud Build Trigger

Cloud Scheduler