-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate performance of config loading for big projects #3893
Comments
I'd like to see us add a CLI command which users can run to produce a flamegraph. It would massively reduce the guesswork here.
|
@datajoely flamegraph for the entire pipeline run (how much time each node takes) or just the config resolution / pipeline initialization? |
In my mind, it would run the whole command as normal, but also generate the profiling data. Perhaps if we were to take this seriously, a full on memray integration would incredible. |
Continuing the discussion on creating custom commands here #3908 |
Originally posted by @idanov in #3732 (comment) The solution works, but From the discussion in the PR:
There were a few thumbs up to the idea, and it was brought up again in #3973 (@datajoely please do confirm that this is what you had in mind 😄) @merelcht pointed out that there's a pending research item on how users use parameters and for what #2240 @ElenaKhaustova agreed that this is relevant in the context of the ongoing Ideally, if there's a way we can tackle this issue without blocking it on #2240, the time to look at it would be now. But I have very little visibility on what are the implications, or whether we would actually solve the performance problem at all. So, leaving the decision to the team. |
Would you really call this coupling? The way I read it is that is uses omegaconf to parse the parameters config. We already have a dependency on omegaconf anyway, and I actually quite like that we can leverage it in more places than just the |
Sorry to keep moving the conversation but I'd rather not discuss the specifics of a particular solution outside the corresponding PR, addressed your question in context at #3732 (comment) |
Description
Earlier this week a user reached out to me in private saying that it was taking 3 minutes for Kedro to load their configuration (
KedroContext._get_catalog
).Today another user mentioned that "Looking at the logs, it gets stuck at the kedro.config.module for more than 50% of the pipeline run duration, but we do have a lot of inputs and outputs"
I still don't have specific reproducers, but I'm noticing enough qualitative evidence to open an issue about it.
The text was updated successfully, but these errors were encountered: