Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3, Add runImportsOnWorkers parameter to execute imports from Worker #5025

Closed
Guigzai opened this issue Jul 19, 2024 · 2 comments · Fixed by #5098
Closed

S3, Add runImportsOnWorkers parameter to execute imports from Worker #5025

Guigzai opened this issue Jul 19, 2024 · 2 comments · Fixed by #5098

Comments

@Guigzai
Copy link

Guigzai commented Jul 19, 2024

Hello,

TOIL Version 6.1
Python 3.9

We use S3 URIs as inputs to our workflows.

Our platform drives the workflows from a VM that mounts the shared space with NFS.

The leader toil downloads the files, which makes this task very slow because it's not performed on the compute infrastructure/worker, which is tuned for high network performance.

Is it possible to implement a --runImportsOnWorkers parameter such as --runLocalJobsOnWorkers to make S3 copies from processing resources?

Thanks

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1619

@unito-bot
Copy link

➤ Adam Novak commented:

Do we need a way to run a workflow with mixed inputs, where some inputs are local file paths only available on the leader filesystem while others are URLs we can fetch from the workers?

I guess we could in that case fetch all local files from the leader and everything else from the worker.

@Guigzai
Copy link
Author

Guigzai commented Jul 29, 2024

No, all the files are stored in Shared GPFS POSIX or S3 available from the worker.
So all files are available form both, leader and worker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants