Skip to content

(local) Build & Setup Instructions

Milo Webster edited this page Jul 28, 2020 · 5 revisions

General Setup

Install the GCP SDK

https://cloud.google.com/sdk/docs/quickstarts

Clone the repo

  • Make sure you have your github SSH key setup
  • git clone git@github.com:mmwebster/voxsrc-2020.git

Setup the environment

Setup for local runs

Install a dataset

  • Move into data dir cd voxsrc-2020/data/
  • Run the install script python utils.py --install-local-dataset --src-bucket voxsrc-2020-voxceleb-v4 --src-dataset no_cuda --dst-data-path ./datasets --dst-list-path ./lists --dst-tmp-path ./tmp
  • Setup symlinks
    • ln -s ./datasets/vox1_no_cuda ../components/train/tmp/data/vox1_no_cuda
    • ln -s ./datasets/vox1_no_cuda.txt ../components/train/tmp/data/vox1_no_cuda.txt
    • ln -s ./datasets/vox2_no_cuda ../components/train/tmp/data/vox2_no_cuda
    • ln -s ./datasets/vox2_no_cuda.txt ../components/train/tmp/data/vox2_no_cuda.txt
  • NOTE: To setup another dataset, replace "no_cuda" with another dataset's name. For example, "full" to install the complete, original dataset

Run the train component (locally, standalone)

  • Move into component dir cd ../components/train/
  • Execute the component's local run script with default config ./run_local.sh

Setup for remote runs on the Kubeflow cluster

Install Docker

Authenticate Docker to our container registry on GCP

Authenticate wandb for your runs on the cluster

  • Find your API key: https://app.wandb.ai/authorize
  • Set an API key environment variable: echo "export WANDB_API_KEY='[YOUR_KEY_HERE]'" >> ~/.bashrc
  • NOTE: Python code in the pipeline file grabs the API key and sets it in the container for the train component

Build an image, compile a pipeline, and upload to Kubeflow web-app

  • NOTE: These steps are now identical to running on Kubeflow via the dev-01 machine
  • Build the component image (and the image for any other component who's code you modified) with: build_image.sh
  • Compile the pipeline files at the top level of the project with python [pipeline-name].py
  • Open the Kubeflow web-app and upload the compiled [pipeline-name].tar.gz as a new pipeline (or pipeline version) and start a run