[Ready] Docs changes to remove pandas-iris and update kedro new f…

…low in onboarding docs (#3317) * Revise link to notebook docs and remove unnecessary intro page Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update starters content Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * relocate starters content Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Added some changes for add-ons and some to do notes Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Some further fixes Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Move section about development version of Kedro Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Add text for new project Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Remove mention of pandas-iris where possible, replacing with alternative Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Fix linter errors Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update new project docs Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Remove deprecated starters from architecture diagram Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Add warning for pandas-iris usage in generator section Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Further updates for instances of kedro new Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Remove TODO as no longer required Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Resolve some Vale issues and remove implication of tools + starters Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * fixes to internal links Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * pandas-spaceflights bad, spaceflights-pandas good Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * fix cookiecutter docs urls Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update the create a starter docs Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Fix link to avoid linkcheck barf Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update docs/source/get_started/new_project.md Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update following review Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update content Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update docs/source/nodes_and_pipelines/nodes.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update docs/source/starters/starters.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Update FAQ Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Updates following review Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> --------- Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> Co-authored-by: Ahdra Merali <ahdra.merali@quantumblack.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
kedro-org · Nov 28, 2023 · 33aa83d · 33aa83d
1 parent 6841a4e
commit 33aa83d
Show file tree

Hide file tree

Showing 25 changed files with 331 additions and 434 deletions.
diff --git a/docs/source/deployment/airflow_astronomer.md b/docs/source/deployment/airflow_astronomer.md
@@ -15,7 +15,7 @@ The following tutorial uses a different approach and shows how to deploy a Kedro
 
 [Astronomer](https://docs.astronomer.io/astro/install-cli) is a managed Airflow platform which allows users to spin up and run an Airflow cluster easily in production. Additionally, it also provides a set of tools to help users get started with Airflow locally in the easiest way possible.
 
-The tutorial discusses how to run the [example Iris classification pipeline](../get_started/new_project.md#create-a-new-project-containing-example-code) on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes:
+The tutorial discusses how to run the example Iris classification pipeline on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes:
 
 ```shell
 kedro new --starter=astro-airflow-iris
@@ -44,10 +44,10 @@ To follow this tutorial, ensure you have the following:
     astro dev init
     ```
 
-2. Create a new Kedro project using the `pandas-iris` starter. You can use the default value in the project creation process:
+2. Create a new Kedro project using the `astro-airflow-iris` starter. You can use the default value in the project creation process:
 
     ```shell
-    kedro new --starter=pandas-iris
+    kedro new --starter=astro-airflow-iris
     ```
 
 3. Copy all files and directories under `new-kedro-project`, which was the default project name created in step 2, to the root directory so Kedro and Astro CLI share the same project root:

diff --git a/docs/source/deployment/distributed.md b/docs/source/deployment/distributed.md
@@ -40,4 +40,4 @@ We encourage you to play with different ways of parameterising your runs as you
 
 ## 4. (Optional) Create starters
 
-This is an optional step, but it may speed up your work in the long term. If you find yourself having to deploy in a similar environment or to a similar platform fairly often, you may want to [build your own Kedro starter](../kedro_project_setup/starters.md). That way you will be able to re-use any deployment scripts written as part of step 2.
+You may opt to [build your own Kedro starter](../starters/starters.md) if you regularly have to deploy in a similar environment or to a similar platform. The starter enables you to re-use any deployment scripts written as part of step 2.
diff --git a/docs/source/extend_kedro/architecture_overview.md b/docs/source/extend_kedro/architecture_overview.md
@@ -37,7 +37,7 @@ Kedro framework serves as the interface between a Kedro project and Kedro librar
 
 ## Kedro starter
 
-You can use a [Kedro starter](../kedro_project_setup/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice.
+You can use a [Kedro starter](../starters/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice.
 
 ## Kedro library
 

diff --git a/docs/source/extend_kedro/common_use_cases.md b/docs/source/extend_kedro/common_use_cases.md
@@ -39,4 +39,4 @@ Your plugin's implementation can take advantage of other extension mechanisms su
 
 ## Use Case 4: How to customise the initial boilerplate of your project
 
-Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, see the [guide for creating Kedro starters](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter).
+Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, see the [guide for creating Kedro starters](../starters/create_a_starter.md).
diff --git a/docs/source/extend_kedro/index.md b/docs/source/extend_kedro/index.md
@@ -6,4 +6,5 @@
 common_use_cases
 plugins
 architecture_overview
+../starters/create_a_starter
 ```
diff --git a/docs/source/extend_kedro/plugins.md b/docs/source/extend_kedro/plugins.md
@@ -42,49 +42,6 @@ Once the plugin is installed, you can run it as follows:
 kedro to_json
 ```
 
-## Extend starter aliases
-It is possible to extend the list of starter aliases built into Kedro. This means that a [custom Kedro starter](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter) can be used directly through the `starter` argument in `kedro new` rather than needing to explicitly provide the `template` and `directory` arguments. A custom starter alias behaves in the same way as an official Kedro starter alias and is also picked up by `kedro starter list`.
-
-You need to extend the starters by providing a list of  `KedroStarterSpec`, in this example it is defined in a file called `plugin.py`.
-
-Example for a non-git repository starter:
-```python
-# plugin.py
-starters = [
-    KedroStarterSpec(
-        alias="test_plugin_starter",
-        template_path="your_local_directory/starter_folder",
-    )
-]
-```
-
-Example for a git repository starter:
-```python
-# plugin.py
-starters = [
-    KedroStarterSpec(
-        alias="test_plugin_starter",
-        template_path="https://github.com/kedro-org/kedro-starters/",
-        directory="spaceflights-pandas",
-    )
-]
-```
-
-The `directory` argument is optional and should be used when you have multiple templates in one repository as for the [official kedro-starters](https://github.com/kedro-org/kedro-starters). If you only have one template, your top-level directory will be treated as the template. For an example, see the [spaceflights-pandas starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas).
-
-In your `pyproject.toml`, you need to register the specifications to `kedro.starters`:
-
-```toml
-[project.entry-points."kedro.starters"]
-starter = "plugin:starters"
-```
-
-After that you can use this starter with `kedro new --starter=test_plugin_starter`.
-
-```{note}
-If your starter lives on a git repository, by default Kedro attempts to use a tag or branch labelled with your version of Kedro, e.g. `0.18.12`. This means that you can host different versions of your starter template on the same repository, and the correct one will automatically be used. If you do not wish to follow this structure, you should override it with the `checkout` flag, e.g. `kedro new --starter=test_plugin_starter --checkout=main`.
-```
-
 ## Working with `click`
 
 Commands must be provided as [`click` `Groups`](https://click.palletsprojects.com/en/7.x/api/#click.Group)

diff --git a/docs/source/faq/faq.md b/docs/source/faq/faq.md
@@ -2,6 +2,12 @@
 
 This is a growing set of technical FAQs. The [product FAQs on the Kedro website](https://kedro.org/#faq) explain how Kedro can answer the typical use cases and requirements of data scientists, data engineers, machine learning engineers and product owners.
 
+
+## Installing Kedro
+* [How do I install a development version of Kedro](https://github.com/kedro-org/kedro/wiki/Guidelines-for-contributing-developers)?
+
+* **How can I check the version of Kedro installed?** To check the version installed, type `kedro -V` in your terminal window.
+
 ## Kedro documentation
 * {doc}`Where can I find the documentation about Kedro-Viz<kedro-viz:kedro-viz_visualisation>`?
 * {doc}`Where can I find the documentation for Kedro's datasets<kedro-datasets:kedro_datasets>`?
@@ -13,7 +19,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website]
 
 ## Kedro project development
 
-* [How do I write my own Kedro starter projects](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter)?
+* [How do I write my own Kedro starter projects](../starters/create_a_starter.md)?
 
 ## Configuration
 

diff --git a/docs/source/get_started/install.md b/docs/source/get_started/install.md
@@ -162,28 +162,6 @@ When migrating an existing project to a newer Kedro version, make sure you also
 * For projects generated with versions of Kedro > 0.17.0, you'll do this in the `pyproject.toml` file from the project root directory.
 * If your project was generated with a version of Kedro <0.17.0, you will instead need to update the `ProjectContext`, which is found in `src/<package_name>/run.py`.
 
-## How to install a development version of Kedro
-
-This section explains how to try out a development version of Kedro direct from the [Kedro GitHub repository](https://github.com/kedro-org/kedro).
-
-```{important}
-The development version of Kedro is not guaranteed to be bug-free and/or compatible with any of the [stable versions](https://pypi.org/project/kedro/#history). We do not recommend that you use a development version of Kedro in any production systems. Please install and use with caution.
-```
-
-To try out latest, unreleased functionality from the `develop` branch of the Kedro GitHub repository, run the following installation command:
-
-```bash
-pip install git+https://github.com/kedro-org/kedro.git@develop
-```
-
-This will install Kedro from the `develop` branch of the GitHub repository, which is always the most up to date. This command will install Kedro from source, unlike `pip install kedro` which installs Kedro from PyPI.
-
-If you want to roll back to a stable version of Kedro, execute the following in your environment:
-
-```bash
-pip uninstall kedro -y
-pip install kedro
-```
 
 ## Summary
 

diff --git a/docs/source/get_started/new_project.md b/docs/source/get_started/new_project.md
@@ -1,29 +1,20 @@
 # Create a new Kedro project
 
-## Summary
+There are several ways to create a new Kedro project. This page explains the flow to create a basic project using `kedro new` to output a project directory containing the basic files and subdirectories that make up a Kedro project.
 
-There are a few ways to create a new project once you have [set up Kedro](install.md):
+You can also create a new Kedro project with a starter that adds a set of code for a common project use case. [Starters are explained separately](../starters/starters.md) later in the documentation set and illustrated with the [spaceflights tutorial](../tutorial/tutorial_template.md).
 
-* You can use `kedro new` to [create a basic Kedro project](#create-a-new-empty-project) containing project directories and basic code, but empty to extend as you need.
-* You can use `kedro new` and [pass in a configuration file](#create-a-new-project-from-a-configuration-file) to manually control project details such as the name, folder and package name.
-* You can [create a Kedro project populated with template code](#create-a-new-project-containing-example-code) that acts as a starter example. This guide illustrates with the `pandas-iris` starter, and there is a [range of Kedro starter projects](../kedro_project_setup/starters.md#list-of-official-starters).
+## Introducing `kedro new`
 
-
-Once you've created a project:
-
-* You need to **navigate to its project folder** and **install its dependencies**: `pip install -r requirements.txt`
-* **To run the project**: `kedro run`
-* **To visualise the project**: `kedro viz`
-
-## Create a new empty project
-
-The simplest way to create a default Kedro project is to navigate to your preferred directory and type:
+You can create a basic Kedro project containing the default code needed to set up your own nodes and pipelines. Navigate to your preferred directory and type:
 
 ```bash
 kedro new
 ```
 
-Enter a name for the project, which can be human-readable and may contain alphanumeric symbols, spaces, underscores and hyphens. It must be at least two characters long.
+### Project name
+
+The command line interface then asks you to enter a name for the project. This is the human-readable name, and it may contain alphanumeric symbols, spaces, underscores, and hyphens. It must be at least two characters long.
 
 It's best to keep the name simple because the choice is set as the value of `project_name` and is also used to generate the folder and package names for the project automatically.
 
@@ -35,48 +26,27 @@ So, if you enter "Get Started", the folder for the project (`repo_name`) is auto
 | Local directory to store the project                           | `repo_name`      | `get-started` |
 | The Python package name for the project (short, all-lowercase) | `python_package` | `get_started` |
 
+### Project tools
 
-The output of `kedro new` is a directory containing all the project files and subdirectories required for a basic Kedro project, ready to extend with the code.
-
-## Create a new project from a configuration file
-
-To customise a new project's directory and package name, use a configuration file to specify those values. The configuration file must contain:
-
--   `output_dir` The path in which to create the project directory
--   `project_name`
--   `repo_name`
--   `python_package`
+The command line interface then asks which tools you'd like to include in the project. The options are as follows and described in more detail above in the [documentation about the new project tools](../starters/new_project_tools.md).
 
-The `output_dir` can be set to customised. For example, `~` for the home directory or `.` for the current working directory. Here is an example `config.yml`, which assumes that a directory named `~/code` already exists:
+You can add one or more of the options, or follow the default and add none at all:
 
-```yaml
-output_dir: ~/code
-project_name: My First Kedro Project
-repo_name: testing-kedro
-python_package: test_kedro
-```
-
-To create this new project:
-
-```bash
-kedro new --config=<path>/config.yml
-```
+* Linting: A basic linting setup with Black and ruff
+* Testing: A basic testing setup with pytest
+* Custom Logging: Additional logging options
+* Documentation: Configuration for basic documentation built with Sphinx
+* Data Structure: The [directory structure](../faq/faq.md#what-is-data-engineering-convention) for storing data locally
+* PySpark: Setup and configuration for working with PySpark
+* Kedro Viz: Kedro's native visualisation tool.
 
-## Create a new project containing example code
+### Project examples
 
-Use a [Kedro starter](../kedro_project_setup/starters.md) to create a project containing template code, to run as-is or to adapt and extend.
+TO DO
 
-The following illustrates a project created with example code based on the familiar [Iris dataset](https://www.kaggle.com/uciml/iris).
+## Run the new project
 
-The first step is to create the Kedro project using a starter to add the example code and data.
-
-```bash
-kedro new --starter=pandas-iris
-```
-
-## Run the project
-
-However you create a Kedro project, once `kedro new` has completed, the next step is to navigate to the project folder (`cd <project-name>`) and install dependencies with `pip` as follows:
+Whichever options you selected for tools and example code, once `kedro new` has completed, the next step is to navigate to the project folder (`cd <project-name>`) and install dependencies with `pip` as follows:
 
 ```bash
 pip install -r requirements.txt
@@ -102,7 +72,7 @@ The Kedro-Viz package needs to be installed into your virtual environment separa
 pip install kedro-viz
 ```
 
-To start Kedro-Viz, enter the following in your terminal:
+To start Kedro-Viz, navigate to the project folder (`cd <project-name>`) and enter the following in your terminal:
 
 ```bash
 kedro viz
@@ -113,7 +83,7 @@ This command automatically opens a browser tab to serve the visualisation at `ht
 To exit the visualisation, close the browser tab. To regain control of the terminal, enter `^+c` on Mac or `Ctrl+c` on Windows or Linux machines.
 
 ## Where next?
-You have completed the section on Kedro project creation for new users. Now choose how to learn more:
+You have completed the section on Kedro project creation for new users. Here are some useful resources to learn more:
 
 * Understand more about Kedro: The following page explains the [fundamental Kedro concepts](./kedro_concepts.md).
 
@@ -122,35 +92,3 @@ You have completed the section on Kedro project creation for new users. Now choo
 * How-to guide for notebook users: The documentation section following the tutorial explains [how to combine Kedro with a Jupyter notebook](../notebooks_and_ipython/kedro_and_notebooks.md).
 
 If you've worked through the documentation listed and are unsure where to go next, review the [Kedro repositories on GitHub](https://github.com/kedro-org) and [Kedro's Slack channels](https://slack.kedro.org).
-
-
-## More information about the `pandas-iris` example project
-
-If you used the `pandas-iris` starter to create an example project, the rest of this page gives further information.
-
-<details>
-<summary>Expand for more details.</summary>
-
-### Background information
-The Iris dataset was generated in 1936 by the British statistician and biologist Ronald Fisher. The dataset contains 150 samples, comprising 50 each of 3 different species of Iris plant (*Iris Setosa*, *Iris Versicolour* and *Iris Virginica*). For each sample, the flower measurements are recorded for the sepal length, sepal width, petal length and petal width.
-
-![](../meta/images/iris_measurements.png)
-
-A machine learning model can use the Iris dataset to illustrate classification (a method used to determine the type of an object by comparison with similar objects that have previously been categorised). Once trained on known data, the machine learning model can make a predictive classification by comparing a test object to the output of its training data.
-
-The Kedro starter contains a single [pipeline](../resources/glossary.md#pipeline) comprising three [nodes](../resources/glossary.md#node) responsible for splitting the data into training and testing samples, running a 1-nearest neighbour classifier algorithm to make predictions and accuracy-reporting.
-
-The nodes are stored in `src/get_started/nodes.py`:
-
-| Node            | Description                                                                         |
-| --------------- | ----------------------------------------------------------------------------------- |
-| `split_data`      | Splits the example Iris dataset into train and test samples                       |
-| `make_predictions`| Makes class predictions (using 1-nearest neighbour classifier and train-test set) |
-| `report_accuracy` | Reports the accuracy of the predictions performed by the previous node.           |
-
-### Iris example: visualisation
-
-If you [visualise your project with Kedro-Viz](#visualise-a-kedro-project) you should see the following:
-
-![](../meta/images/pipeline_visualisation_iris_starter.png)
-</details>