-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support importing on workers in WDL #5103
Conversation
Don't you mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see at least the command line options shared between WDL and CWL here, although WDL might want to have the --kebab-case
forms conditioned out.
What would we need to do to make the same importer job class able to be used for both CWL and WDL? If we made it a job that takes a list of file paths/URIs and some search path information, and returns a list of found absolute paths/URIs and Toil file IDs, would we then be able to write the needed adapters?
src/toil/options/wdl.py
Outdated
parser.add_argument( | ||
"--runImportsOnWorkers", "--run-imports-on-workers", | ||
action="store_true", default=False, | ||
help=suppress_help or "Run the file imports on a worker instead of the leader. This is useful if the leader is not optimized for high network performance." | ||
"If set to true, the argument --importWorkersDisk must also be set.", | ||
dest="run_imports_on_workers" | ||
) | ||
|
||
parser.add_argument("--importWorkersDisk", "--import-workers-disk", | ||
help=suppress_help or "Specify the amount of disk space an import worker will use. If file streaming for input files is not available," | ||
"this should be set to the size of the largest input file. This must be set in conjunction with the argument runImportsOnWorkers.", | ||
dest="import_workers_disk", | ||
type=lambda x: human2bytes(str(x)), | ||
default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of the other WDL-side options have the --kebab-case
form.
Also, is it possible to share the option definitions for these between CWL and WDL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function signatures are different, I think it's easier to keep them separate than figuring out a way to get around argparse, unless we remove kebab case from the CWL equivalent of these options, which I originally included as all other CWL options are kebab case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can actually move it out into a separate function
src/toil/wdl/wdltoil.py
Outdated
# Take care of incompatible arguments related to file imports | ||
if options.run_imports_on_workers is True and options.import_workers_disk is None: | ||
logger.error("Commandline arguments --runImportsOnWorkers and --importWorkersDisk must both be set to run file imports on workers.") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to raise an error or exit with a nonzero exit code, or the workflow will look like it succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a RuntimeError
src/toil/wdl/wdltoil.py
Outdated
# Import any files in the bindings | ||
input_bindings = import_files(inputs, target.name, toil._jobStore, inputs_search_path, skip_remote=options.reference_inputs) | ||
# Run the workflow and get its outputs namespaced with the workflow name. | ||
root_job = WDLRootJob(target, input_bindings, wdl_options=wdl_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to rename WDLRootJob
since it isn't always the root anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to WDLStartJob
Co-authored-by: Adam Novak <anovak@soe.ucsc.edu>
…25-import-on-workers-wdl
…incompatible options are detected
@stxue1 I think you forgot to add |
Whoops, added |
…osphere/toil into issues/5025-import-on-workers-wdl
…25-import-on-workers-wdl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should work.
run_imports_on_workers_arguments = ["--runImportsOnWorkers"] | ||
if cwl: | ||
run_imports_on_workers_arguments.append("--run-imports-on-workers") | ||
parser.add_argument(*run_imports_on_workers_arguments, action="store_true", default=False, dest="run_imports_on_workers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we usually like to make new boolean options use str2bool
and accept False
/True
, since we've been upgrading so many of the old ones to that format. Maybe we should upgrade this one too?
Mirrors #5025 to the WDL side too
Changelog Entry
To be copied to the draft changelog by merger:
Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist