-
Notifications
You must be signed in to change notification settings - Fork 39
This document contains experimental features for a future release of doodad.
The DFile library (based on GFile) provides a seamless file interface for local files and remote files. The dfile.open
function automatically detects whether a file is located locally or on some remote service (via SSH, S3, or more) and returns the appropriate file pointer. This means that in most cases you can write your code as if all files were local, without worrying about where the data is stored.
DFile detects the location of the file based on the prefix of the filename. For some of these options (i.e. GCS, AWS, SSH) you will need to configure the appropriate credentials
-
s3://<path>
will map to a remote file via AWS S3. (#credentials). -
gs://<path>
will map to a remote file via Google Cloud Storage. -
ssh://<username>@<hostname>/<path>
will map to a remote file via SSH. -
docker://<container-id>/<path>
will map to a file inside a locally running docker container. - All other filenames will be mapped to the local filesystem.
Here is some example usage:
import doodad.dfile as dfile
with dfile.open(r's3://my.bucket/my_file.txt', mode='r') as f:
# This will read a file from S3
with dfile.open(r'ssh://user@hostname.com/tmp/my_file.txt', mode='w') as f:
# This will write a file and copy it to user@hostname.com via SSH
There are some external libraries which are hardcoded to use python's open
function. In this case, we can override python's open
built-in with dfile.open
to force the external library to use DFile as follows:
import doodad.dfile as dfile
with dfile.override_builtin():
import external_library
The credentials library manages credentials for remote services such as SSH and AWS.
doodad.credentials.aws
doodad.credentials.ssh
TODO: Explain how to configure credentials.