-
Notifications
You must be signed in to change notification settings - Fork 39
(v0.2) How to use Doodad
This page covers how to use the low-level interface for doodad (Mounts, Modes, and launching). For a simple interface to quickly launch hyperparameter sweeps, see the sweeper page.
The launch mode specifies where to run a job. There are currently 4 modes available.
doodad.mode.Local()
This mode simply runs a script via command line using the default python command of your shell. It does not take in any arguments.
doodad.mode.LocalDocker(
image=[str:'ubuntu:16.04']
)
This mode launches scripts inside a local docker instance, using the specified docker image. Docker must be installed on the local machine for this to work.
This mode is useful for local debugging before running over SSH or EC2.
doodad.mode.SSHDocker(
image=[str:'ubuntu:16.04']
credentials=[SSHCredentials],
)
This mode launches scripts inside a remote docker instance, using the specified docker image. Docker must be installed on the host machine for this to work.
The recommended way to specify credentials is to point toNext, you will need to create a GCE image with docker installed. You can do this by creating a new instance, SSH-ing into the instance and doing a manual docker install. Add your user to the docker group with "sudo usermod -aG docker ", then do a reboot with "sudo reboot" to update the user info. Finally, stop the instance and create an image using the GCP console.
an identity file, i.e.:
credentials = doodad.ssh.SSHCredentials(
hostname=[str],
username=[str],
identity_file=[str]
)
EC2 is supported via spot instances.
The easiest way to set up EC2 is to use the scripts/setup_ec2.py
file and use the EC2AutoconfigDocker
constructor (Use the EC2SpotDocker
class to customize AWS specific details):
doodad.mode.EC2AutoconfigDocker(
image=[str:'ubuntu:16.04'],
region=[str:'us-west-1'], # EC2 region
instance_type=[str:'m3.medium'], # EC2 instance type
spot_price=[float:0.02], # Maximum bid price
s3_log_prefix=[str:'experiment'], # Folder to store log files under
terminate=[bool:True], # Whether to terminate on finishing job
)
Output files will be stored on S3 under the folder s3://<bucket_name>/doodad/logs/<s3_log_prefix>/run_XXXXXXX
EC2 instance types can be found here and spot prices here. Generally, the c4
instances offer good performance, adequate memory, and good price.
Currently there is no automated setup script for GCP. You will have to follow these instructions here:
- https://cloud.google.com/sdk/docs/quickstart-debian-ubuntu
- https://cloud.google.com/compute/docs/tutorials/python-guide
- https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python
Next, you will need to create a GCE image with docker installed. You can do this by creating a new instance, SSH-ing into the instance and doing a manual docker install. Add your user to the docker group with "sudo usermod -aG docker your_username", then do a reboot with "sudo reboot" to update the user info. Finally, stop the instance and create an image using the GCP console.
doodad.mode.GCPDocker(
image=[str:'ubuntu:16.04'], # Docker image
zone=[str:'us-west1-a'], # GCP zone
instance_type=[str:'n1-standard-2'], # GCP instance type
image_name=[str:'ubuntu-1804-docker'], # GCP image name
image_project=[str:'justinfu-qlearning'], # GCP image project
gcp_log_prefix=[str:'experiment'], # Folder to store log files under
terminate=[bool:True], # Whether to terminate on finishing job
)
Output files will be stored on Google cloud storage under the folder gs://<bucket_name>/doodad/logs/<s3_log_prefix>/XXXXX
All input and output data is handled by mount objects.
doodad.mount.MountLocal(
local_dir=[str], # The name of this directory on disk
mount_point=[str], # The name of this directory as visible to the script
pythonpath=[bool:False], #
output=[bool:False], # Whether this directory is an empty directory for storing outputs.
filter_ext=[tuple(str):('.pyc', '.log', '.git', '.mp4')], # File extensions to not include
filter_dir=[tuple(str):('data')] # Directories to ignore
)
For remote launch modes (EC2, SSH), non-output directories will be copied to the remote server. Output directories will not be copied.
For SSH, output directories will not be copied back automatically. The directories will also show up as root permissions on disk, so you must copy back the data manually. I am currently working on a fix for this.
For EC2, all output mounts must be replaced by S3 mounts:
doodad.mount.MountS3(
s3_path=[str],
mount_point=[str],
output=[bool:True],
sync_interval=[int:15],
include_types=[tuple(str):('*.txt', '*.csv', '*.json', '*.gz', '*.tar', '*.log', '*.pkl')]
)
The contents of this folder will by synced to s3://<bucket_name>/doodad/logs/<s3_log_prefix>/run_XXXXXXX/<s3_path>
To pull all results for an experiment, you can use the following aws-cli command:
aws s3 sync s3://<bucket_name>/doodad/logs/<s3_log_prefix> .
With the launch mode and mounts specified, we can now launch a python script using the launch_python
function:
doodad.launch_tools.launch_python(
target=[str],
mode=[LaunchMode],
mount_points=[list(Mount)],
verbose=[bool:False],
args=[dict],
)
The target
argument should be an absolute filepath to the target script.
mount_points
should be a list of Mount
objects.
Sometimes it is useful to pass arguments to a target script (i.e. hyperparameter settings).
Within the python script, import the doodad
library and use the get_args
function:
import doodad
doodad.get_args(<key_name>, <default_value>)
# For example
doodad.get_args('arg_name1', 2)
The default value will be used if a) the script was not launched via doodad, or b) if it was launched but the argument was not specified.
Then, in the launch call, fill in the args
with a dictionary:
doodad.launch_tools.launch_python(
target,
mode=mode,
mount_points=mounts,
args={
'arg_name1': arg_value,
'arg_name2': arg_value
}
)
Check out the launch file and the target script
This section will be expanded in the future