Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready] Docs changes to remove pandas-iris and update kedro new flow in onboarding docs #3317

Merged
merged 45 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
9e9449c
Revise link to notebook docs and remove unnecessary intro page
stichbury Oct 30, 2023
c5b5b0d
Update starters content
stichbury Oct 30, 2023
ce80d99
Merge branch 'develop' into fix-starters-content
stichbury Nov 16, 2023
3cee9f4
relocate starters content
stichbury Nov 16, 2023
124bd90
Added some changes for add-ons and some to do notes
stichbury Nov 16, 2023
b58d9a4
Merge branch 'develop' into fix-starters-content
stichbury Nov 16, 2023
f9062ce
Merge branch 'develop' into fix-starters-content
stichbury Nov 20, 2023
233a4de
Merge branch 'develop' into fix-starters-content
stichbury Nov 21, 2023
4abbdbb
Some further fixes
stichbury Nov 21, 2023
fb5c335
Merge branch 'develop' into fix-starters-content
stichbury Nov 21, 2023
ab9bcd9
Move section about development version of Kedro
stichbury Nov 21, 2023
b787664
Add text for new project
stichbury Nov 21, 2023
47f7e05
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 21, 2023
cd7938e
Remove mention of pandas-iris where possible, replacing with alternative
stichbury Nov 21, 2023
8724441
Merge branch 'develop' into fix-starters-content
stichbury Nov 21, 2023
c4d8ed5
Fix linter errors
stichbury Nov 21, 2023
f7cbffa
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 21, 2023
fc4aed5
Update new project docs
stichbury Nov 22, 2023
2349b7d
Merge branch 'develop' into fix-starters-content
AhdraMeraliQB Nov 22, 2023
bdf10a3
Remove deprecated starters from architecture diagram
stichbury Nov 22, 2023
6e678e6
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 22, 2023
f2a8868
Add warning for pandas-iris usage in generator section
stichbury Nov 22, 2023
50b51c9
Further updates for instances of kedro new
stichbury Nov 22, 2023
5db4cad
Remove TODO as no longer required
Nov 22, 2023
b397ccf
Merge
Nov 22, 2023
d83793e
Resolve some Vale issues and remove implication of tools + starters
stichbury Nov 22, 2023
8708aae
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 22, 2023
edc4e51
fixes to internal links
stichbury Nov 22, 2023
8a832c5
Merge branch 'develop' into fix-starters-content
stichbury Nov 22, 2023
f6e1844
pandas-spaceflights bad, spaceflights-pandas good
stichbury Nov 22, 2023
1c64a78
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 22, 2023
0c66ac3
Merge branch 'develop' into fix-starters-content
stichbury Nov 23, 2023
135ad6a
fix cookiecutter docs urls
stichbury Nov 23, 2023
711bc0b
Update the create a starter docs
stichbury Nov 23, 2023
ccc8f96
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 23, 2023
201776d
Fix link to avoid linkcheck barf
stichbury Nov 23, 2023
b02d672
Update docs/source/get_started/new_project.md
stichbury Nov 23, 2023
57701f6
Update following review
stichbury Nov 23, 2023
aec7ebc
Update content
stichbury Nov 23, 2023
59fdfcf
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 23, 2023
8c38a11
Update docs/source/nodes_and_pipelines/nodes.md
stichbury Nov 27, 2023
bd04596
Update docs/source/starters/starters.md
stichbury Nov 27, 2023
735ed4f
Update FAQ
stichbury Nov 27, 2023
c09cac5
Merge branch 'develop' into fix-starters-content
stichbury Nov 27, 2023
7cb1dd4
Updates following review
stichbury Nov 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/addons_and_starters/addons.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Add-ons

<!--TO DO-->
<!--Detailed usage of add-ons goes here-->
Original file line number Diff line number Diff line change
@@ -1,87 +1,8 @@
# Kedro starters
# How to create a Kedro starter

A Kedro starter contains code in the form of a [Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/) template for a Kedro project. Metaphorically, a starter is similar to using a pre-defined layout when creating a presentation or document.
<!--TO DO-->
<!--This page needs improving-->

Kedro starters provide pre-defined example code and configuration that can be reused, for example:

* As template code for a typical Kedro project
* To add a `docker-compose` setup to launch Kedro next to a monitoring stack
* To add deployment scripts and CI/CD setup for your targeted infrastructure

You can create your own starters for reuse within a project or team, as described in the documentation about [how to create a Kedro starter](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter).

## How to use Kedro starters

To create a Kedro project using a starter, apply the `--starter` flag to `kedro new`:

```bash
kedro new --starter=<path-to-starter>
```

```{note}
`path-to-starter` could be a local directory or a VCS repository, as long as [Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/usage.html) supports it.
```

To create a project using the `PySpark` starter:

```bash
kedro new --starter=pyspark
```

## Starter aliases

We provide aliases for common starters maintained by the Kedro team so that users don't have to specify the full path. For example, to use the `PySpark` starter to create a project:

```bash
kedro new --starter=pyspark
```

To list all the aliases we support:

```bash
kedro starter list
```

## List of official starters

The Kedro team maintains the following starters for a range of Kedro projects:

* [`astro-airflow-iris`](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris): The [Kedro Iris dataset example project](../get_started/new_project.md) with a minimal setup for deploying the pipeline on Airflow with [Astronomer](https://www.astronomer.io/).
* [`standalone-datacatalog`](https://github.com/kedro-org/kedro-starters/tree/main/standalone-datacatalog): A minimum setup to use the traditional [Iris dataset](https://www.kaggle.com/uciml/iris) with Kedro's [`DataCatalog`](../data/data_catalog.md), which is a core component of Kedro. This starter is of use in the exploratory phase of a project. It was formerly known as `mini-kedro`.
* [`pandas-iris`](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris): The [Kedro Iris dataset example project](../get_started/new_project.md)
* [`pyspark-iris`](https://github.com/kedro-org/kedro-starters/tree/main/pyspark-iris): An alternative Kedro Iris dataset example, using [PySpark](../integrations/pyspark_integration.md)
* [`pyspark`](https://github.com/kedro-org/kedro-starters/tree/main/pyspark): The configuration and initialisation code for a [Kedro pipeline using PySpark](../integrations/pyspark_integration.md)
* [`spaceflights-pandas`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pandas` datasets.
* [`spaceflights-pandas-viz`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pandas` datasets and visualisation and experiment tracking `kedro-viz` features.
* [`spaceflights-pyspark`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pyspark` datasets.
* [`spaceflights-pyspark-viz`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pyspark` datasets and visualisation and experiment tracking `kedro-viz` features.

## Starter versioning

By default, Kedro will use the latest version available in the repository, but if you want to use a specific version of a starter, you can pass a `--checkout` argument to the command:

```bash
kedro new --starter=pyspark --checkout=0.1.0
```

The `--checkout` value points to a branch, tag or commit in the starter repository.

Under the hood, the value will be passed to the [`--checkout` flag in Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/usage.html#works-directly-with-git-and-hg-mercurial-repos-too).


## Use a starter with a configuration file

By default, when you create a new project using a starter, `kedro new` asks you to enter the `project_name`, which it uses to set the `repo_name` and `python_package` name. This is the same behavior as when you [create a new empty project](../get_started/new_project.md#create-a-new-empty-project)

However, Kedro also allows you to [specify a configuration file](../get_started/new_project.md#create-a-new-project-from-a-configuration-file) when you create a project using a Kedro starter. Use the `--config` flag alongside the starter:

```bash
kedro new --config=my_kedro_pyspark_project.yml --starter=pyspark
```

This option is useful when the starter requires more configuration than the default mode requires.

## How to create a Kedro starter

Kedro starters are used to create projects that contain code to run as-is, or to adapt and extend. A good example is the Iris dataset example of basic Kedro project layout, configuration and initialisation code. A team may find it useful to build Kedro starters to create reusable projects that bootstrap a common base and can be extended.

Expand All @@ -100,7 +21,7 @@ You then need to decide which are:
* the common, boilerplate parts of the project
* the configurable elements, which need to be replaced by `cookiecutter` strings

### Configuration variables
## Configuration variables

By default, when you create a new project using a Kedro starter, `kedro new` launches in interactive mode. The user is then prompted for the variables that have been set in `prompts.yml`.

Expand Down Expand Up @@ -131,7 +52,7 @@ If the input to the prompts needs to be **validated**, for example to make sure

If you want `cookiecutter` to provide sensible **defaults** in case a user doesn't provide any input, you can add those to `cookiecutter.json`. See [the default starter `cookiecutter.json`](https://github.com/kedro-org/kedro/blob/main/kedro/templates/project/cookiecutter.json) as example.

### Example Kedro starter
## Example Kedro starter

To review an example Kedro starter, check out the [`pandas-iris` starter on GitHub](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris).

Expand Down Expand Up @@ -173,3 +94,5 @@ Here is the layout of the project as a Cookiecutter template:
```{note}
You can [add an alias by creating a plugin using `kedro.starters` entry point](../extend_kedro/plugins.md#extend-starter-aliases), which will allows you to do `kedro new --starter=your_starters` and shows up on shows up on `kedro starter list`.
```


13 changes: 13 additions & 0 deletions docs/source/addons_and_starters/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Add-ons and starters


<!--TO DO-->
<!--Introductory text goes here-->


```{toctree}
:maxdepth: 1

addons
starters
```
87 changes: 87 additions & 0 deletions docs/source/addons_and_starters/starters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Kedro starters

stichbury marked this conversation as resolved.
Show resolved Hide resolved
<!--TO DO-->
<!--This page needs updating-->


A Kedro starter contains code in the form of a [Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/) template for a Kedro project. Metaphorically, a starter is similar to using a pre-defined layout when creating a presentation or document.

Check warning on line 7 in docs/source/addons_and_starters/starters.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/addons_and_starters/starters.md#L7

[Kedro.toowordy] 'similar to' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'similar to' is too wordy", "location": {"path": "docs/source/addons_and_starters/starters.md", "range": {"start": {"line": 7, "column": 169}}}, "severity": "WARNING"}

Kedro starters provide pre-defined example code and configuration that can be reused, for example:

* As template code for a typical Kedro project
* To add a `docker-compose` setup to launch Kedro next to a monitoring stack
* To add deployment scripts and CI/CD setup for your targeted infrastructure

You can create your own starters for reuse within a project or team, as described in the documentation about [how to create a Kedro starter](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter).

## How to use Kedro starters

To create a Kedro project using a starter, apply the `--starter` flag to `kedro new`:

```bash
kedro new --starter=<path-to-starter>
```

```{note}
`path-to-starter` could be a local directory or a VCS repository, as long as [Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/usage.html) supports it.
```

To create a project using the `PySpark` starter:

```bash
kedro new --starter=pyspark
```

## Starter aliases

We provide aliases for common starters maintained by the Kedro team so that users don't have to specify the full path. For example, to use the `PySpark` starter to create a project:

```bash
kedro new --starter=pyspark
```

To list all the aliases we support:

```bash
kedro starter list
```

## List of official starters

The Kedro team maintains the following starters for a range of Kedro projects:

* [`astro-airflow-iris`](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris): The [Kedro Iris dataset example project](../get_started/new_project.md) with a minimal setup for deploying the pipeline on Airflow with [Astronomer](https://www.astronomer.io/).
* [`standalone-datacatalog`](https://github.com/kedro-org/kedro-starters/tree/main/standalone-datacatalog): A minimum setup to use the traditional [Iris dataset](https://www.kaggle.com/uciml/iris) with Kedro's [`DataCatalog`](../data/data_catalog.md), which is a core component of Kedro. This starter is of use in the exploratory phase of a project. It was formerly known as `mini-kedro`.

Check warning on line 54 in docs/source/addons_and_starters/starters.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/addons_and_starters/starters.md#L54

[Kedro.toowordy] 'It was' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'It was' is too wordy", "location": {"path": "docs/source/addons_and_starters/starters.md", "range": {"start": {"line": 54, "column": 351}}}, "severity": "WARNING"}
* [`pandas-iris`](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris): The [Kedro Iris dataset example project](../get_started/new_project.md)
* [`pyspark-iris`](https://github.com/kedro-org/kedro-starters/tree/main/pyspark-iris): An alternative Kedro Iris dataset example, using [PySpark](../integrations/pyspark_integration.md)
* [`pyspark`](https://github.com/kedro-org/kedro-starters/tree/main/pyspark): The configuration and initialisation code for a [Kedro pipeline using PySpark](../integrations/pyspark_integration.md)
* [`spaceflights-pandas`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pandas` datasets.
* [`spaceflights-pandas-viz`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pandas` datasets and visualisation and experiment tracking `kedro-viz` features.
* [`spaceflights-pyspark`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pyspark` datasets.
* [`spaceflights-pyspark-viz`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pyspark` datasets and visualisation and experiment tracking `kedro-viz` features.

## Starter versioning

By default, Kedro will use the latest version available in the repository, but if you want to use a specific version of a starter, you can pass a `--checkout` argument to the command:

Check notice on line 65 in docs/source/addons_and_starters/starters.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/addons_and_starters/starters.md#L65

[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.
Raw output
{"message": "[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.", "location": {"path": "docs/source/addons_and_starters/starters.md", "range": {"start": {"line": 65, "column": 1}}}, "severity": "INFO"}

```bash
kedro new --starter=pyspark --checkout=0.1.0
```

The `--checkout` value points to a branch, tag or commit in the starter repository.

Under the hood, the value will be passed to the [`--checkout` flag in Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/usage.html#works-directly-with-git-and-hg-mercurial-repos-too).


## Use a starter with a configuration file

By default, when you create a new project using a starter, `kedro new` asks you to enter the `project_name`, which it uses to set the `repo_name` and `python_package` name. This is the same behavior as when you [create a new empty project](../get_started/new_project.md#create-a-new-empty-project)

However, Kedro also allows you to [specify a configuration file](../get_started/new_project.md#create-a-new-project-from-a-configuration-file) when you create a project using a Kedro starter. Use the `--config` flag alongside the starter:

Check warning on line 80 in docs/source/addons_and_starters/starters.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/addons_and_starters/starters.md#L80

[Kedro.toowordy] 'However' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'However' is too wordy", "location": {"path": "docs/source/addons_and_starters/starters.md", "range": {"start": {"line": 80, "column": 1}}}, "severity": "WARNING"}

```bash
kedro new --config=my_kedro_pyspark_project.yml --starter=pyspark
```

This option is useful when the starter requires more configuration than the default mode requires.

1 change: 1 addition & 0 deletions docs/source/extend_kedro/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@
common_use_cases
plugins
architecture_overview
../addons_and_starters/create_a_starter
```
Loading
Loading