Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: How can we make upgrading Kedro projects easier? #3960

Open
merelcht opened this issue Jun 19, 2024 · 8 comments
Open

Spike: How can we make upgrading Kedro projects easier? #3960

merelcht opened this issue Jun 19, 2024 · 8 comments

Comments

@merelcht
Copy link
Member

merelcht commented Jun 19, 2024

Description

Users have frequently mentioned that upgrading between major versions in Kedro is difficult. The main difficulty seems to be with upgrading the project structure when template changes happened in the breaking releases.

Perhaps a tool like cruft or copier could help here, or even a python script that users can apply to go from e.g. 0.17.0 -> 0.19.0 and 0.18.0 -> 0.19.0

The task here is to do a spike investigating the different options and deciding on the best way forward.

Context

I didn't fully appreciate the issue until we recently talked to a team that explained they have around 40 projects with Kedro, but not all have an active maintenance team. This results in them having different Kedro versions for the projects. The biggest struggle is updating the template, because that is a very manual job an requires someone understanding what the changes are between breaking Kedro versions.

Possible Solutions

  • cruft
  • copier
  • Custom python script that we provide for users as part of a breaking release

Extra Info

Earlier mentions to upgrading issues and/or curft/copier:

@astrojuanlu
Copy link
Member

astrojuanlu commented Jun 19, 2024

Adding some context from #3959 (we wrote the same issue at the same time 😄)

Description

Our users find it difficult to upgrade Kedro in their projects. Just from a couple of recent user interviews:

User 1:

  • "We sometimes try to upgrade the version [paraphrasing] and upgrading versions impacts other packages that we use"
  • "It creates trouble for us when we try to redeploy something that we deployed one year ago"

User 2:

  • "I begin a project, and after 3 months I need to upgrade the project with a new Kedro"
  • "It’s very laborious for us to upgrade"
  • They end up with tens of projects, each of them with slightly different versions of Kedro, which is difficult for the team to maintain

There are also very clear signs that old Kedro versions tend to live on for a very long time:

image

People like @inigohidalgo have been reporting their long journey to upgrade from old Kedro versions in our public Slack.

And finally, we also have lots of internal evidence as well that big projects get stuck on old Kedro versions.

Is this a problem?

One could argue: "it it works, don't touch it". So the fact that Kedro is pinned to a specific version is not necessarily a bad thing.

However, with resource constrained teams maintaining many projects, each of them with slightly different versions of Kedro, this can become a mess to maintain.

For those teams who would wish to have a uniform Kedro versioning, we should provide a more clean upgrade path.

What has been done

For 0.19 we went ahead and added a detailed migration guide https://docs.kedro.org/en/latest/resources/migration.html#migrate-an-existing-project-that-uses-kedro-0-18-to-use-0-19

Where to go from here

However, it seems to not yet be enough.

What else can we do to make these migrations easier?

@astrojuanlu
Copy link
Member

Anecdotally our most sophisticated users are the ones that get stuck the most since they typically verge into complex hooks, dynamism and coupling to some of the internals. With 1.0.0 on the horizon hopefully these internals have stabilised and won't be as incompatible/painful going forward. This is my main hypothesis why I think some sort of automated tooling may not move the needle.
I'm still very much of the opinion that we need to build user-facing superpowers that make the effort to upgrade worth it. Introducing the settings.py in 0.18.x was a breaking change very much more important for the Kedro developers rather than Kedro users. Cynically one could argue we could do a better job making something like OmergaConf a 0.19.x, even if it technically could be made to work in a non-breaking way.

Originally posted by @datajoely in #3959 (comment)

@astrojuanlu
Copy link
Member

Maybe we could draw inspiration from database migration tools as well, like alembic, Django migrations.

@noklam
Copy link
Contributor

noklam commented Jun 20, 2024

Adding another option which operate at a lower level: https://libcst.readthedocs.io/en/latest/codemods_tutorial.html, or pyupgrade

@inigohidalgo
Copy link
Contributor

inigohidalgo commented Jun 27, 2024

I agree with @datajoely's view: our upgrade path was particularly complex because we started with kedro very early on in its life, and a lot of custom components were built through a MCK engagement. Those custom components relied on internal kedro behavior which was later deprecated (load_context). If we had not had such custom components the upgrade path would have been much easier. But there is no functionality on kedro's side which would've eased this, as the upgrade meant replicating or replacing the behavior with new functionality like hooks.

I like things like cruft or copier for keeping various of our own projects in sync but I don't see how you can reasonably expect to use those in aiding upgrades.

Anyways I would expect that as you near 1.0, this should become less of an issue, with breaking changes being less frequent, and also less breaking in general, with simpler and clearer upgrades.

@astrojuanlu
Copy link
Member

I think there's at least 3 sources of friction when it comes to upgrades:

  1. Small breaking changes (think Import TRANSCODING_SEPARATOR to pipeline to make Kedro backwards compatible with kedro-viz on python 3.8 #3822)
  2. Larger API changes like the ones @inigohidalgo describes (think load_context)
  3. Changes in the template structure (think Merge template pyproject.toml into one #2926)

For 1, I think even though we're not super strict, if we do see breakage we fix it. There might be breaking things we're not seeing.

For 2, we're already adding deprecation warnings, but arguably we could do more. We just started writing migration guides in 0.18 -> 0.19. We could also think of "linting" tools that detect deprecated functionality. It's probably too much work though.

For 3 though we're not doing anything of substance beyond telling people to recreate their project structure and it's where https://github.com/copier-org/copier could definitely help.

Anything that I'm missing?

@datajoely
Copy link
Contributor

Anything that I'm missing?

Only the fact that advanced users will likely patch things like the context, sessions and custom CLIs which make it near impossible to upgrade.

I still expect the session to improve through the deployment workstream, so it's hard to say if we're actually done there.

@astrojuanlu
Copy link
Member

https://fediverse.zachleat.com/@zachleat/112689087055371089

Upgraded @eleventy from ESLint v8 to v9 in less than 5 minutes using the new @eslint/migrate-config package. Was even able to remove a few unused inline directives too 🏆

Thanks @eslint team!

Screenshot 2024-06-27 at 16-39-19 Zach Leatherman 11ty Upgraded @eleventy from ESLint v8 to v9 in less than 5 minutes u… _ Phanpy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

5 participants