Skip to content

sfujiwara/preemptible-trainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An Example Using Preemptible VM for TensorFlow

This is an example to use preemptible VM for machine learning.

Requirements

Architecture

Key Points

  • Use managed instance group with size one
    • Managed instance group can automatically restart preemptible VM
    • See the official document for details
  • Make your instance to run Docker container when the instance starts
  • Make instance group size zero after training finished

Run on Cloud

Configure Google Cloud SDK

gcloud config set account <your.google.account@gmail.com>
gcloud config set project <Your GCP Project>
gcloud auth configure-docker

Build Docker Image

./bin/docker_build.sh

Push Docker Image to Google Container Registry

./bin/docker_push.sh

Create Instance Template and Managed Instance Group

./bin/create_instance_group.sh

Run on Local

Build Docker Image

./bin/docker_build.sh

Run Docker Image

./bin/docker_run.sh