Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall improvements to workflow parallelization #538

Closed
6 of 8 tasks
nfahlgren opened this issue Mar 3, 2020 · 3 comments
Closed
6 of 8 tasks

Overall improvements to workflow parallelization #538

nfahlgren opened this issue Mar 3, 2020 · 3 comments
Labels
enhancement Enhancements to existing features Epic Discussions and broad multi-issue ideas
Milestone

Comments

@nfahlgren
Copy link
Member

nfahlgren commented Mar 3, 2020

This is an Epic to help organize the overall efforts to improve PlantCV workflow parallelization. There are three main components to workflow parallelization:

  • Metadata processing/handling
  • Workflow handling
  • Parallelization

Various improvements have been suggested for both areas. We will attempt to subdivide these into discreet issues as much as possible, but many improvements will be inherently interlinked. Areas of improvement are (in no particular order):

Will update if I missed anything or plans change.

@nfahlgren nfahlgren added Epic Discussions and broad multi-issue ideas enhancement Enhancements to existing features labels Mar 3, 2020
@dschneiderch
Copy link
Collaborator

@nfahlgren this professional isolation moment seems like a good opportunity to work on these. i'd be particularly interested in working on the grouping/mapping functionality although I think most of these issues are inherently linked.
there was some discussion at Phenome about how to tackle the multiple image sets and we tentatively concluded that instead of trying to wrangle the plantcv-parallel.py to accept multiple images, we could write a preprocessing script that bundled groups of images together based on metadata and a user-provided config file. A potential bundle could be a list of filenames, a multiframe tiff, or a zip file.

In #423 @nfahlgren said he had basic functionality for grouping images by metadata (replicating the --coprocess flag). If you upload this branch we could discuss what/where needs to happen to move this forward.

@dschneiderch
Copy link
Collaborator

dschneiderch commented Jul 22, 2020

hi guys, wondering if there is any movement towards tackling these improvements now that pcv 4.0 is only one release away :)
I noticed someone opened a timeseries submodule #587 which seems like it might relate to the multiframe functionality we need for psII images. I would be happy to help

@nfahlgren
Copy link
Member Author

Hi @dschneiderch, this kind of fell off to the back burner the last few months. We just had a group discussion yesterday about some needs for the timeseries submodule related to this issue, so definitely related.

The paper still needs quite a bit of writing, but our thinking is that we could move onto the 4.x series soon and closeout 3.x. There are some outstanding issues like this one where it's tempting to hold it over to 4.x because some of the other big changes we want to make in 4.x could make tackling this issue easier. The timeseries subpackage is something we are already for sure holding over to 4.x.

We don't want to hold up progress though, so we're going to make a 4.x branch where we can have pull requests go for non-3.x features. We will keep 4.x rebased on 3.x so that we don't create a big point of divergence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancements to existing features Epic Discussions and broad multi-issue ideas
Projects
None yet
2 participants