Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Avoid datasink connection, which will always invalidate cache #440

Closed
wants to merge 1 commit into from

Conversation

mgxd
Copy link
Collaborator

@mgxd mgxd commented Jun 3, 2024

Any ds_* workflow should probably not have an outputnode to avoid this problem.

@effigies
Copy link
Member

effigies commented Jun 3, 2024

The goal really is to use the outputs of the derivatives workflows, so that we're using the same inputs as downstream tools (or reruns with these derivatives as outputs) would be.

I think the problem here is with the complexity of DerivativesDataSink, which can actually modify the contents of files (e.g., setting the right dtypes). I believe the sink nodes already do not make copies when the input matches the output, but the fixup breaks that check. So maybe we should split into prepare and sink nodes, where the prepare node just passes through the filename if no changes are to be made?

@mgxd
Copy link
Collaborator Author

mgxd commented Jun 7, 2024

I agree - this boils down to splitting up the DDS behemoth into much more succinct parts. Should I close this?

@effigies
Copy link
Member

effigies commented Jun 7, 2024

No. :-)

@effigies effigies closed this Jun 7, 2024
@mgxd mgxd deleted the fix/cache-invalidation branch June 7, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants