Skip to content

Commit

Permalink
[Ready] Docs changes to remove pandas-iris and update kedro new f…
Browse files Browse the repository at this point in the history
…low in onboarding docs (#3317)

* Revise link to notebook docs and remove unnecessary intro page

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update starters content

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* relocate starters content

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Added some changes for add-ons and some to do notes

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Some further fixes

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Move section about development version of Kedro

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Add text for new project

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Remove mention of pandas-iris where possible, replacing with alternative

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Fix linter errors

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update new project docs

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Remove deprecated starters from architecture diagram

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Add warning for pandas-iris usage in generator section

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Further updates for instances of kedro new

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Remove TODO as no longer required

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Resolve some Vale issues and remove implication of tools + starters

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* fixes to internal links

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* pandas-spaceflights bad, spaceflights-pandas good

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* fix cookiecutter docs urls

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update the create a starter docs

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Fix link to avoid linkcheck barf

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/get_started/new_project.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update following review

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update content

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/nodes_and_pipelines/nodes.md

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/starters/starters.md

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update FAQ

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Updates following review

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

---------

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>
Co-authored-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
  • Loading branch information
5 people committed Nov 28, 2023
1 parent 6841a4e commit 33aa83d
Show file tree
Hide file tree
Showing 25 changed files with 331 additions and 434 deletions.
6 changes: 3 additions & 3 deletions docs/source/deployment/airflow_astronomer.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The following tutorial uses a different approach and shows how to deploy a Kedro

[Astronomer](https://docs.astronomer.io/astro/install-cli) is a managed Airflow platform which allows users to spin up and run an Airflow cluster easily in production. Additionally, it also provides a set of tools to help users get started with Airflow locally in the easiest way possible.

The tutorial discusses how to run the [example Iris classification pipeline](../get_started/new_project.md#create-a-new-project-containing-example-code) on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes:
The tutorial discusses how to run the example Iris classification pipeline on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes:

```shell
kedro new --starter=astro-airflow-iris
Expand Down Expand Up @@ -44,10 +44,10 @@ To follow this tutorial, ensure you have the following:
astro dev init
```

2. Create a new Kedro project using the `pandas-iris` starter. You can use the default value in the project creation process:
2. Create a new Kedro project using the `astro-airflow-iris` starter. You can use the default value in the project creation process:

```shell
kedro new --starter=pandas-iris
kedro new --starter=astro-airflow-iris
```

3. Copy all files and directories under `new-kedro-project`, which was the default project name created in step 2, to the root directory so Kedro and Astro CLI share the same project root:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/distributed.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ We encourage you to play with different ways of parameterising your runs as you

## 4. (Optional) Create starters

This is an optional step, but it may speed up your work in the long term. If you find yourself having to deploy in a similar environment or to a similar platform fairly often, you may want to [build your own Kedro starter](../kedro_project_setup/starters.md). That way you will be able to re-use any deployment scripts written as part of step 2.
You may opt to [build your own Kedro starter](../starters/starters.md) if you regularly have to deploy in a similar environment or to a similar platform. The starter enables you to re-use any deployment scripts written as part of step 2.
2 changes: 1 addition & 1 deletion docs/source/extend_kedro/architecture_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Kedro framework serves as the interface between a Kedro project and Kedro librar

## Kedro starter

You can use a [Kedro starter](../kedro_project_setup/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice.
You can use a [Kedro starter](../starters/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice.

## Kedro library

Expand Down
2 changes: 1 addition & 1 deletion docs/source/extend_kedro/common_use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ Your plugin's implementation can take advantage of other extension mechanisms su

## Use Case 4: How to customise the initial boilerplate of your project

Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, see the [guide for creating Kedro starters](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter).
Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, see the [guide for creating Kedro starters](../starters/create_a_starter.md).
1 change: 1 addition & 0 deletions docs/source/extend_kedro/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@
common_use_cases
plugins
architecture_overview
../starters/create_a_starter
```
43 changes: 0 additions & 43 deletions docs/source/extend_kedro/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,49 +42,6 @@ Once the plugin is installed, you can run it as follows:
kedro to_json
```

## Extend starter aliases
It is possible to extend the list of starter aliases built into Kedro. This means that a [custom Kedro starter](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter) can be used directly through the `starter` argument in `kedro new` rather than needing to explicitly provide the `template` and `directory` arguments. A custom starter alias behaves in the same way as an official Kedro starter alias and is also picked up by `kedro starter list`.

You need to extend the starters by providing a list of `KedroStarterSpec`, in this example it is defined in a file called `plugin.py`.

Example for a non-git repository starter:
```python
# plugin.py
starters = [
KedroStarterSpec(
alias="test_plugin_starter",
template_path="your_local_directory/starter_folder",
)
]
```

Example for a git repository starter:
```python
# plugin.py
starters = [
KedroStarterSpec(
alias="test_plugin_starter",
template_path="https://github.com/kedro-org/kedro-starters/",
directory="spaceflights-pandas",
)
]
```

The `directory` argument is optional and should be used when you have multiple templates in one repository as for the [official kedro-starters](https://github.com/kedro-org/kedro-starters). If you only have one template, your top-level directory will be treated as the template. For an example, see the [spaceflights-pandas starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas).

In your `pyproject.toml`, you need to register the specifications to `kedro.starters`:

```toml
[project.entry-points."kedro.starters"]
starter = "plugin:starters"
```

After that you can use this starter with `kedro new --starter=test_plugin_starter`.

```{note}
If your starter lives on a git repository, by default Kedro attempts to use a tag or branch labelled with your version of Kedro, e.g. `0.18.12`. This means that you can host different versions of your starter template on the same repository, and the correct one will automatically be used. If you do not wish to follow this structure, you should override it with the `checkout` flag, e.g. `kedro new --starter=test_plugin_starter --checkout=main`.
```

## Working with `click`

Commands must be provided as [`click` `Groups`](https://click.palletsprojects.com/en/7.x/api/#click.Group)
Expand Down
8 changes: 7 additions & 1 deletion docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

This is a growing set of technical FAQs. The [product FAQs on the Kedro website](https://kedro.org/#faq) explain how Kedro can answer the typical use cases and requirements of data scientists, data engineers, machine learning engineers and product owners.


## Installing Kedro
* [How do I install a development version of Kedro](https://github.com/kedro-org/kedro/wiki/Guidelines-for-contributing-developers)?

* **How can I check the version of Kedro installed?** To check the version installed, type `kedro -V` in your terminal window.

## Kedro documentation
* {doc}`Where can I find the documentation about Kedro-Viz<kedro-viz:kedro-viz_visualisation>`?
* {doc}`Where can I find the documentation for Kedro's datasets<kedro-datasets:kedro_datasets>`?
Expand All @@ -13,7 +19,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website]

## Kedro project development

* [How do I write my own Kedro starter projects](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter)?
* [How do I write my own Kedro starter projects](../starters/create_a_starter.md)?

## Configuration

Expand Down
22 changes: 0 additions & 22 deletions docs/source/get_started/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,28 +162,6 @@ When migrating an existing project to a newer Kedro version, make sure you also
* For projects generated with versions of Kedro > 0.17.0, you'll do this in the `pyproject.toml` file from the project root directory.
* If your project was generated with a version of Kedro <0.17.0, you will instead need to update the `ProjectContext`, which is found in `src/<package_name>/run.py`.

## How to install a development version of Kedro

This section explains how to try out a development version of Kedro direct from the [Kedro GitHub repository](https://github.com/kedro-org/kedro).

```{important}
The development version of Kedro is not guaranteed to be bug-free and/or compatible with any of the [stable versions](https://pypi.org/project/kedro/#history). We do not recommend that you use a development version of Kedro in any production systems. Please install and use with caution.
```

To try out latest, unreleased functionality from the `develop` branch of the Kedro GitHub repository, run the following installation command:

```bash
pip install git+https://github.com/kedro-org/kedro.git@develop
```

This will install Kedro from the `develop` branch of the GitHub repository, which is always the most up to date. This command will install Kedro from source, unlike `pip install kedro` which installs Kedro from PyPI.

If you want to roll back to a stable version of Kedro, execute the following in your environment:

```bash
pip uninstall kedro -y
pip install kedro
```

## Summary

Expand Down
108 changes: 23 additions & 85 deletions docs/source/get_started/new_project.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,20 @@
# Create a new Kedro project

## Summary
There are several ways to create a new Kedro project. This page explains the flow to create a basic project using `kedro new` to output a project directory containing the basic files and subdirectories that make up a Kedro project.

There are a few ways to create a new project once you have [set up Kedro](install.md):
You can also create a new Kedro project with a starter that adds a set of code for a common project use case. [Starters are explained separately](../starters/starters.md) later in the documentation set and illustrated with the [spaceflights tutorial](../tutorial/tutorial_template.md).

* You can use `kedro new` to [create a basic Kedro project](#create-a-new-empty-project) containing project directories and basic code, but empty to extend as you need.
* You can use `kedro new` and [pass in a configuration file](#create-a-new-project-from-a-configuration-file) to manually control project details such as the name, folder and package name.
* You can [create a Kedro project populated with template code](#create-a-new-project-containing-example-code) that acts as a starter example. This guide illustrates with the `pandas-iris` starter, and there is a [range of Kedro starter projects](../kedro_project_setup/starters.md#list-of-official-starters).
## Introducing `kedro new`


Once you've created a project:

* You need to **navigate to its project folder** and **install its dependencies**: `pip install -r requirements.txt`
* **To run the project**: `kedro run`
* **To visualise the project**: `kedro viz`

## Create a new empty project

The simplest way to create a default Kedro project is to navigate to your preferred directory and type:
You can create a basic Kedro project containing the default code needed to set up your own nodes and pipelines. Navigate to your preferred directory and type:

```bash
kedro new
```

Enter a name for the project, which can be human-readable and may contain alphanumeric symbols, spaces, underscores and hyphens. It must be at least two characters long.
### Project name

The command line interface then asks you to enter a name for the project. This is the human-readable name, and it may contain alphanumeric symbols, spaces, underscores, and hyphens. It must be at least two characters long.

It's best to keep the name simple because the choice is set as the value of `project_name` and is also used to generate the folder and package names for the project automatically.

Expand All @@ -35,48 +26,27 @@ So, if you enter "Get Started", the folder for the project (`repo_name`) is auto
| Local directory to store the project | `repo_name` | `get-started` |
| The Python package name for the project (short, all-lowercase) | `python_package` | `get_started` |

### Project tools

The output of `kedro new` is a directory containing all the project files and subdirectories required for a basic Kedro project, ready to extend with the code.

## Create a new project from a configuration file

To customise a new project's directory and package name, use a configuration file to specify those values. The configuration file must contain:

- `output_dir` The path in which to create the project directory
- `project_name`
- `repo_name`
- `python_package`
The command line interface then asks which tools you'd like to include in the project. The options are as follows and described in more detail above in the [documentation about the new project tools](../starters/new_project_tools.md).

The `output_dir` can be set to customised. For example, `~` for the home directory or `.` for the current working directory. Here is an example `config.yml`, which assumes that a directory named `~/code` already exists:
You can add one or more of the options, or follow the default and add none at all:

```yaml
output_dir: ~/code
project_name: My First Kedro Project
repo_name: testing-kedro
python_package: test_kedro
```
To create this new project:
```bash
kedro new --config=<path>/config.yml
```
* Linting: A basic linting setup with Black and ruff
* Testing: A basic testing setup with pytest
* Custom Logging: Additional logging options
* Documentation: Configuration for basic documentation built with Sphinx
* Data Structure: The [directory structure](../faq/faq.md#what-is-data-engineering-convention) for storing data locally
* PySpark: Setup and configuration for working with PySpark
* Kedro Viz: Kedro's native visualisation tool.

## Create a new project containing example code
### Project examples

Use a [Kedro starter](../kedro_project_setup/starters.md) to create a project containing template code, to run as-is or to adapt and extend.
TO DO

The following illustrates a project created with example code based on the familiar [Iris dataset](https://www.kaggle.com/uciml/iris).
## Run the new project

The first step is to create the Kedro project using a starter to add the example code and data.

```bash
kedro new --starter=pandas-iris
```

## Run the project

However you create a Kedro project, once `kedro new` has completed, the next step is to navigate to the project folder (`cd <project-name>`) and install dependencies with `pip` as follows:
Whichever options you selected for tools and example code, once `kedro new` has completed, the next step is to navigate to the project folder (`cd <project-name>`) and install dependencies with `pip` as follows:

```bash
pip install -r requirements.txt
Expand All @@ -102,7 +72,7 @@ The Kedro-Viz package needs to be installed into your virtual environment separa
pip install kedro-viz
```

To start Kedro-Viz, enter the following in your terminal:
To start Kedro-Viz, navigate to the project folder (`cd <project-name>`) and enter the following in your terminal:

```bash
kedro viz
Expand All @@ -113,7 +83,7 @@ This command automatically opens a browser tab to serve the visualisation at `ht
To exit the visualisation, close the browser tab. To regain control of the terminal, enter `^+c` on Mac or `Ctrl+c` on Windows or Linux machines.

## Where next?
You have completed the section on Kedro project creation for new users. Now choose how to learn more:
You have completed the section on Kedro project creation for new users. Here are some useful resources to learn more:

* Understand more about Kedro: The following page explains the [fundamental Kedro concepts](./kedro_concepts.md).

Expand All @@ -122,35 +92,3 @@ You have completed the section on Kedro project creation for new users. Now choo
* How-to guide for notebook users: The documentation section following the tutorial explains [how to combine Kedro with a Jupyter notebook](../notebooks_and_ipython/kedro_and_notebooks.md).

If you've worked through the documentation listed and are unsure where to go next, review the [Kedro repositories on GitHub](https://github.com/kedro-org) and [Kedro's Slack channels](https://slack.kedro.org).


## More information about the `pandas-iris` example project

If you used the `pandas-iris` starter to create an example project, the rest of this page gives further information.

<details>
<summary>Expand for more details.</summary>

### Background information
The Iris dataset was generated in 1936 by the British statistician and biologist Ronald Fisher. The dataset contains 150 samples, comprising 50 each of 3 different species of Iris plant (*Iris Setosa*, *Iris Versicolour* and *Iris Virginica*). For each sample, the flower measurements are recorded for the sepal length, sepal width, petal length and petal width.

![](../meta/images/iris_measurements.png)

A machine learning model can use the Iris dataset to illustrate classification (a method used to determine the type of an object by comparison with similar objects that have previously been categorised). Once trained on known data, the machine learning model can make a predictive classification by comparing a test object to the output of its training data.

The Kedro starter contains a single [pipeline](../resources/glossary.md#pipeline) comprising three [nodes](../resources/glossary.md#node) responsible for splitting the data into training and testing samples, running a 1-nearest neighbour classifier algorithm to make predictions and accuracy-reporting.

The nodes are stored in `src/get_started/nodes.py`:

| Node | Description |
| --------------- | ----------------------------------------------------------------------------------- |
| `split_data` | Splits the example Iris dataset into train and test samples |
| `make_predictions`| Makes class predictions (using 1-nearest neighbour classifier and train-test set) |
| `report_accuracy` | Reports the accuracy of the predictions performed by the previous node. |

### Iris example: visualisation

If you [visualise your project with Kedro-Viz](#visualise-a-kedro-project) you should see the following:

![](../meta/images/pipeline_visualisation_iris_starter.png)
</details>
Loading

0 comments on commit 33aa83d

Please sign in to comment.