Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override new pipeline template #2543

Closed
jasonmhite opened this issue Apr 26, 2023 · 8 comments · Fixed by #2701
Closed

Override new pipeline template #2543

jasonmhite opened this issue Apr 26, 2023 · 8 comments · Fixed by #2701
Assignees
Labels
Community Issue/PR opened by the open-source community

Comments

@jasonmhite
Copy link
Contributor

jasonmhite commented Apr 26, 2023

I'd like to override the code that is generated by kedro pipeline create to better suit some conventions we have developed on my team. Specifically, I'm trying to build a new Kedro starter that implements some of our conventions, and ideally I'd like to have a starter that also includes a template to generate new modular pipeline boilerplate in the format we have adopted automatically.

It seems like the code generated is drawn from the cookiecutter template in kedro/kedro/templates/pipeline and I know how to modify that, however I can't seem to find a way to override/extend that template at the project level in the documentation. Could someone provide some guidance? Thanks in advance and apologies if the Issues are not the right place to ask.

@noklam
Copy link
Contributor

noklam commented Apr 26, 2023

@noklam
Copy link
Contributor

noklam commented Apr 26, 2023

Sorry, I just read that again and you actually want to override the pipeline template. This is currently not possible, as it follows the modular pipeline structure. How do you want to change it? It would be great if you share some example and reason why you want to change it that way.

@jasonmhite
Copy link
Contributor Author

jasonmhite commented Apr 26, 2023

It's a pretty simple change, we prefer to put the node and pipeline definitions in a single file (at least by default). Optionally, the file would be called pipeline_<pipeline_name>.py. Having multiple files named pipeline.py gets confusing in the editor; I know you can just rename them but I'd like to have it be automatic.

I also wrote a decorator to build nodes at declaration time, and it'd be nice if I could have the generated pipeline source automatically import that decorator.

Pretty sure I know how to do this in the cookiecutter template Kedro uses internally, but it sounds like there isn't currently a way to override that at the project level.

@noklam
Copy link
Contributor

noklam commented Apr 27, 2023

That's a fair point about the filename. Could you give some code example what you mean by build node at declaration tim and how that plays with the pipeline definition? Do you mean you simply write function and use decorator to build this as a node? Possibly relate to #2471?

The current auto pipeline discovery relies on the modular pipeline structure. In the meantime you can always extend or override the CLI by importing the KedroCLI class.

@jasonmhite
Copy link
Contributor Author

jasonmhite commented Apr 27, 2023

As I understand it, the auto discovery is just looking for a create_pipeline function it can import. The file structure isn't that important as long as you rename the imports. I made a gist of what I'm talking about, this sort of layout definitely works (I'm using it) I just want to not have to manually delete and rename.

https://gist.github.com/jasonmhite/59870a929fac44a925bd5608d6adb465 -- note: imagine the - in the filenames were / instead, github won't let me simulate folders.

And indeed, the decorator I made seems similar to #2471 though I hadn't seen that. I find that syntax to be very ergonomic and the implementation is pretty trivial, it'd be a nice feature in its own right. I'll comment there.

@antonymilne
Copy link
Contributor

antonymilne commented May 4, 2023

This makes a lot of sense I think. Agree about the problem with multiple pipeline.py files being awkward in particular and something that a user might want to customise.

This wouldn't be hard to achieve if we added a flag to the kedro pipeline create CLI to pass a path to your own template_path and made this line less hardcoded:

template_path = Path(kedro.__file__).parent / "templates" / "pipeline"

This would then work much the same way as we allow different template_paths for kedro new --starter.

You could actually already achieve this by writing your own custom kedro pipeline create command that overrides the built-in one, but personally I would be happy to accept a PR that adds this functionality into kedro itself (together with adding it to our docs) 🙂

@jasonmhite
Copy link
Contributor Author

jasonmhite commented May 10, 2023

@antonymilne That's helpful, I'll look into seeing if I can make this change when I have a bit of time.

@jasonmhite
Copy link
Contributor Author

jasonmhite commented May 31, 2023

@antonymilne I implemented this via a config option for settings.py that can point to an arbitrary template folder. From my testing it seems to work, and falls back to the previous default behavior. Can you take a look and see before I open a PR? https://github.com/jasonmhite/kedro/tree/feature/pipeline-templates

I was able to make a starter that includes a pipeline template and preconfigured to use it, which is exactly what I wanted. https://code.ornl.gov/4uh/kedro-template

@merelcht merelcht linked a pull request Jul 7, 2023 that will close this issue
5 tasks
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants