Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow importing on workers #5098

Merged
merged 7 commits into from
Sep 26, 2024
Merged

Allow importing on workers #5098

merged 7 commits into from
Sep 26, 2024

Conversation

stxue1
Copy link
Contributor

@stxue1 stxue1 commented Sep 19, 2024

#5025

Changelog Entry

To be copied to the draft changelog by merger:

  • Added support to import files on workers for toil-cwl-runner
    • --runImportsOnWorkers to enable importing files on workers
    • --importWorkersDisk to control how much disk space the import worker will use

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

Copy link
Member

@DailyDreaming DailyDreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, but really needs a test.

@@ -3143,7 +3177,7 @@ def __init__(
self.cwlwf = cwlwf
self.cwljob = cwljob
self.runtime_context = runtime_context
self.cwlwf = remove_pickle_problems(self.cwlwf)
self.cwlwf = self.cwlwf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No-op.

@@ -3111,7 +3141,11 @@ def hasChild(self, c: Job) -> Any:


def remove_pickle_problems(obj: ProcessType) -> ProcessType:
"""Doc_loader does not pickle correctly, causing Toil errors, remove from objects."""
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only called in one place now, other than itself. Is it safe to remove and was the issue resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still needed as I ran into this issue while creating the CWLImportJob. Something internal in the CWL tool object is unpickleable and must be removed. Before it was being called in every Job initialization, but now I moved it to be ran only once on the leader.

@@ -1,4 +1,5 @@
"""Implemented support for Common Workflow Language (CWL) for Toil."""
import argparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import?

@@ -281,6 +282,21 @@ def add_cwl_options(parser: ArgumentParser, suppress: bool = True) -> None:
help=suppress_help or "Disable file streaming for files that have 'streamable' flag True",
dest="disable_streaming",
)
parser.add_argument(
"--runImportsOnWorkers", "--run-imports-on-workers",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this?

@adamnovak
Copy link
Member

You have to say the magic word "Fixes #5025" or Github won't auto-link the PR to the issue.

@adamnovak adamnovak linked an issue Sep 24, 2024 that may be closed by this pull request
@adamnovak
Copy link
Member

I'm not sure why https://ucsc-ci.com/databiosphere/toil/-/jobs/78630 timed out. The main Gitlab server did not drop off the network, and the newly added test runs to completion quickly on my machine.

@adamnovak adamnovak dismissed DailyDreaming’s stale review September 24, 2024 17:51

Looks like the requested changes have been made.

@adamnovak adamnovak merged commit 716eb1b into master Sep 26, 2024
1 check passed
@adamnovak adamnovak deleted the issues/5025-import-on-workers branch September 26, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3, Add runImportsOnWorkers parameter to execute imports from Worker
3 participants