Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kedro-Airflow configuration #229

Closed
sbrugman opened this issue Jun 9, 2023 · 4 comments
Closed

Kedro-Airflow configuration #229

sbrugman opened this issue Jun 9, 2023 · 4 comments
Labels
Community Issue/PR opened by the open-source community

Comments

@sbrugman
Copy link
Contributor

sbrugman commented Jun 9, 2023

Description

When using the kedro to Airflow dags conversion, configuration differs per DAG, there is currently no out-of-the-box way to automatically provide parameters such as the schedule_interval, owner per pipeline. This is useful when the dags are generated and deployed in a devops pipeline without manual intervention.

Happy to contribute this feature once there is consensus on an implementation.

Possible Implementation

Preferably, configuration such as conf/base/airflow.yml or conf/base/airflow/[PIPELINE].yml is passed on to the template rendering as kwargs. The benefit is that the configuration is in one place, and it's consistent within the Kedro framework.

Is any of the other CLI kedro [command] having access to the kedro config?

Another implementation is to allow argument to kedro airflow create that supports passing parameters (e.g. --param key=value). The user then has to take responsibility of passing the parameters.

Possible Alternatives

Alternatively, the user generates templates for each pipeline. This requires no modification of the plugin, but puts a lot of burden on the user.

@noklam
Copy link
Contributor

noklam commented Jun 12, 2023

I have a few questions @sbrugman

  1. Is it possible to implement in a way that we don't need to keep updating the list of argument like schedule_interval, owner?
  2. It shouldn't be hard to read config from airflow.yml, this can be done but it is fundamentally the same as passing it via the CLI

Another implementation is to allow argument to kedro airflow create that supports passing parameters (e.g. --param key=value). The user then has to take responsibility of passing the parameters.

This is same as 2, I think 1 is the more important question here, whether how we pass or parse the argument is trivial.

@noklam noklam added the Community Issue/PR opened by the open-source community label Jun 12, 2023
@sbrugman
Copy link
Contributor Author

Am I understanding your question correctly that if the user add an additional parameter, can we make it in a way that the template does not have to be updated?

The default_args and DAG arguments can be generated from a dictionary in the config. Then only when the user adds a parameter that are not for the dag or the same for each node (default_args), the template needs modification.

The arguments that already in the template file, would be configured dynamically. The list itself and its defaults will stay the same. Changing the values is up to the user.

There is small a difference in functionality between reading from the Kedro config (e.g. airflow.yml) and passing explicit parameters. This lies in that Kedro offers multiple config patterns. (I would prefer this option)

@noklam
Copy link
Contributor

noklam commented Jun 12, 2023

@sbrugman in that case I think this is a good improvement.

Regard to config, it should be quite easy to do, using the after_contrxt_created hook you can use kedro config.

The CLI argument should override config value if provided.

@merelcht
Copy link
Member

merelcht commented Aug 1, 2023

Closed by #233

@merelcht merelcht closed this as completed Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
Archived in project
Development

No branches or pull requests

3 participants