Skip to content

Detailed Documentation

Justin Fu edited this page Jun 16, 2020 · 1 revision

Table of Contents

  1. Launch Modes
  2. Input and Output Data
  3. Launching a job

Launch Modes

The launch mode specifies where to run a job. There are currently 4 modes available.

Local

doodad.mode.LocalMode()

This mode simply runs a script via command line using the default python command of your shell. It does not take in any arguments.

This mode is useful for debugging experiments before launching remotely.

SSH

doodad.mode.SSHDocker(
    credentials=[SSHCredentials],
)

This mode launches scripts inside a remote docker instance, using the specified docker image. Docker must be installed on the host machine for this to work.

The recommended way to specify credentials is to point to an identity file, i.e.:

credentials = doodad.credentials.ssh.SSHCredentials(
    hostname=[str],
    username=[str],
    identity_file=[str]
)

EC2

EC2 is supported via spot instances.

The easiest way to set up EC2 is to use the scripts/setup_ec2.py file (detailed instructions) and use the EC2AutoconfigDocker constructor (Use the EC2Mode class to fill in AWS arguments manually):

doodad.mode.EC2Autoconfig(
    s3_log_path=[str], # Folder to store log files under, under root directory of bucket
    region=[str:'us-west-1'],  # EC2 region
    instance_type=[str:'m3.medium'],  # EC2 instance type
    spot_price=[float:0.02],  # Maximum bid price
    terminate_on_end=[bool:True],  # Whether to terminate on finishing job
)

Output files will be stored on S3 under the folder s3://<bucket_name>/<s3_log_path>/

EC2 instance types can be found here and spot prices here. Generally, the c5 instances offer good performance, adequate memory, and good price.

GCP

Currently there is no automated setup script for GCP. You will have to follow the setup instructions.

doodad.mode.GCPDocker(
    zone=[str:'us-west1-a'],  # GCP zone
    instance_type=[str:'n1-standard-2'],  # GCP instance type
    image_name=[str:'your-image'], # GCP image name
    image_project=[str:'your-project'], # GCP image project
    gcp_log_path=[str:'experiment'], # Folder to store log files under
    terminate_on_end=[bool:True],  # Whether to terminate on finishing job
    use_gpu=[bool:False], # Whether to use GPUs
    gpu_model=[str:'nvidia-tesla-t4'], # GPU type
    num_gpu=[int:1] # Number of GPUs to use
)

Output files will be stored on Google cloud storage under the folder gs://<bucket_name>/<gcp_log_path>/XXXXX

The GPU model must be one of the models listed on the Google cloud website. Additionally, the docker image must have the nvidia container extension for docker installed. See the GCP setup instructions for more details.

Input and output data

All input and output data is handled by mount objects.

doodad.mount.MountLocal(
    local_dir=[str], # The name of this directory on disk
    mount_point=[str], # The name of this directory as visible to the script
    pythonpath=[bool:False], # Whether to add this directory to the pythonpath
    output=[bool:False], # Whether this directory is an empty directory for storing outputs.
    filter_ext=[tuple(str):('.pyc', '.log', '.git', '.mp4')], # File extensions to not include
    filter_dir=[tuple(str):('data')] # Directories to ignore
)

For remote launch modes (EC2, SSH), non-output directories will be copied to the remote server. Output directories will not be copied.

For SSH, output directories will not be copied back automatically. The directories will also show up as root permissions on disk, so you must copy back the data manually. I am currently working on a fix for this.

Git Repositories

You can create mounts that directly point to specific branches in Git repositories. This feature is useful if you do not want to store a repository locally (and use MountLocal), or if you need to work with different versions of the same repository.

doodad.mount.MountGit(
    git_url=[str],  # Git URL
    branch=[str:"master"],  # Git branch
    ssh_identity=[str],  # SSH identity file for git clone
    mount_point=[str],
    pythonpath=[bool]
)

EC2

For EC2, all output mounts must be replaced by S3 mounts:

doodad.mount.MountS3(
    s3_path=[str],
    mount_point=[str],  # Directory visible to the running job.
)

The contents of this folder will by synced to s3://<bucket_name>/<s3_log_path>/outputs/<s3_path>, where the bucket and log paths are specified from the launch mode.

To pull all results for an experiment, you can use the following aws-cli command:

aws s3 sync s3://<bucket_name>/path/to/your/logs .

GCP

For GCP, all output mounts must be replaced by GCP mounts:

doodad.mount.MountGCP(
    gcp_path=[str],
    mount_point=[str],  # Directory visible to the running job.
)

The contents of this folder will by synced to gs://<bucket_name>/<gcp_log_path>/logs/<gcp_path>, where the bucket and log paths are specified from the launch mode.

To pull all results for an experiment, you can use the following gsutil command:

gsutil rsync gs://<bucket_name>/path/to/your/logs .

Launching a job

With the launch mode and mounts specified, we can now launch a python script using the launch_python function:

doodad.launch.launch_api.launch_python(
    target=[str],
    mode=[LaunchMode],
    mounts=[list(Mount)],
    docker_image=[str:"ubuntu:18.04"],
)

The target argument should be an absolute filepath to the target script. You can add python command-line arguments after the script. mounts should be a list of Mount objects.

For non-python programs, you can directly run shell scripts

doodad.launch.launch_api.run_command(
    command=[str],
    mode=[LaunchMode],
    mounts=[list(Mount)],
    docker_image=[str:"ubuntu:18.04"],
)