Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support importing on workers in WDL #5103

Merged
merged 22 commits into from
Oct 9, 2024
Merged

Conversation

stxue1
Copy link
Contributor

@stxue1 stxue1 commented Sep 26, 2024

Mirrors #5025 to the WDL side too

Changelog Entry

To be copied to the draft changelog by merger:

  • Added support to import files on workers for toil-wdl-runner

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@adamnovak
Copy link
Member

Don't you mean toil-wdl-runner in the changelog entry?

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see at least the command line options shared between WDL and CWL here, although WDL might want to have the --kebab-case forms conditioned out.

What would we need to do to make the same importer job class able to be used for both CWL and WDL? If we made it a job that takes a list of file paths/URIs and some search path information, and returns a list of found absolute paths/URIs and Toil file IDs, would we then be able to write the needed adapters?

Comment on lines 40 to 53
parser.add_argument(
"--runImportsOnWorkers", "--run-imports-on-workers",
action="store_true", default=False,
help=suppress_help or "Run the file imports on a worker instead of the leader. This is useful if the leader is not optimized for high network performance."
"If set to true, the argument --importWorkersDisk must also be set.",
dest="run_imports_on_workers"
)

parser.add_argument("--importWorkersDisk", "--import-workers-disk",
help=suppress_help or "Specify the amount of disk space an import worker will use. If file streaming for input files is not available,"
"this should be set to the size of the largest input file. This must be set in conjunction with the argument runImportsOnWorkers.",
dest="import_workers_disk",
type=lambda x: human2bytes(str(x)),
default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of the other WDL-side options have the --kebab-case form.

Also, is it possible to share the option definitions for these between CWL and WDL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function signatures are different, I think it's easier to keep them separate than figuring out a way to get around argparse, unless we remove kebab case from the CWL equivalent of these options, which I originally included as all other CWL options are kebab case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can actually move it out into a separate function

src/toil/wdl/wdltoil.py Outdated Show resolved Hide resolved
src/toil/wdl/wdltoil.py Outdated Show resolved Hide resolved
# Take care of incompatible arguments related to file imports
if options.run_imports_on_workers is True and options.import_workers_disk is None:
logger.error("Commandline arguments --runImportsOnWorkers and --importWorkersDisk must both be set to run file imports on workers.")
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to raise an error or exit with a nonzero exit code, or the workflow will look like it succeeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a RuntimeError

# Import any files in the bindings
input_bindings = import_files(inputs, target.name, toil._jobStore, inputs_search_path, skip_remote=options.reference_inputs)
# Run the workflow and get its outputs namespaced with the workflow name.
root_job = WDLRootJob(target, input_bindings, wdl_options=wdl_options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to rename WDLRootJob since it isn't always the root anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to WDLStartJob

@adamnovak
Copy link
Member

@stxue1 I think you forgot to add src/toil/options/runner.py to Git.

@stxue1
Copy link
Contributor Author

stxue1 commented Sep 30, 2024

@stxue1 I think you forgot to add src/toil/options/runner.py to Git.

Whoops, added

…osphere/toil into issues/5025-import-on-workers-wdl
@stxue1 stxue1 linked an issue Oct 3, 2024 that may be closed by this pull request
Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should work.

Comment on lines +14 to +17
run_imports_on_workers_arguments = ["--runImportsOnWorkers"]
if cwl:
run_imports_on_workers_arguments.append("--run-imports-on-workers")
parser.add_argument(*run_imports_on_workers_arguments, action="store_true", default=False, dest="run_imports_on_workers",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually like to make new boolean options use str2bool and accept False/True, since we've been upgrading so many of the old ones to that format. Maybe we should upgrade this one too?

@stxue1 stxue1 merged commit 8f665a9 into master Oct 9, 2024
1 check passed
@stxue1 stxue1 deleted the issues/5025-import-on-workers-wdl branch October 9, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add import on workers for WDL
2 participants