Skip to content

Commit

Permalink
Add architecture graphic back to docs with revisions (#2916)
Browse files Browse the repository at this point in the history
* Revise FAQs and README

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Add back the data layers FAQ as I've no idea where else it fits

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* minor changes from review

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Re-add kedro arch diagram, with revised graphic

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* fix broken anchor

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* fix broken anchor

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/extend_kedro/architecture_overview.md

Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Update docs/source/extend_kedro/architecture_overview.md

Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Update docs/source/extend_kedro/architecture_overview.md

Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Update docs/source/extend_kedro/architecture_overview.md

Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>

* Changes to architecture page following review

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Change diagram following reivew

* Add links to API docs

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Added in type of users

* Fix linting error

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

---------

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Co-authored-by: Yetunde Dada <43755008+yetudada@users.noreply.github.com>
  • Loading branch information
3 people authored Aug 18, 2023
1 parent 4a7b132 commit 67ccff3
Show file tree
Hide file tree
Showing 5 changed files with 362 additions and 2 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ We also curate a [GitHub repo that lists content created by the Kedro community]

## Contribute to the project

There are quite a few ways to contribute to Kedro, sich as answering questions about Kedro to help others, fixing a typo on the documentation, reporting a bug, reviewing pull requests or adding a feature.
There are quite a few ways to contribute to Kedro, such as answering questions about Kedro to help others, fixing a typo on the documentation, reporting a bug, reviewing pull requests or adding a feature.

Take a look at some of our [contribution suggestions on the Kedro GitHub Wiki](https://github.com/kedro-org/kedro/wiki/Contribute-to-Kedro)!

Expand Down
55 changes: 55 additions & 0 deletions docs/source/extend_kedro/architecture_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Kedro architecture overview

There are different ways to leverage Kedro in your work, you can:

- Commit to using all of Kedro (framework, project, starters and library); which is preferable to take advantage of the full value proposition of Kedro
- You can leverage parts of Kedro, like the DataCatalog (I/O), ConfigLoader, Pipelines and Runner, by using it as a Python libary; this best supports a workflow where you don't want to adopt the Kedro project template
- Or, you can develop extensions for Kedro e.g. custom starters, plugins, Hooks and more

At a high level, Kedro consists of five main parts:

![Kedro architecture diagram](../meta/images/kedro_architecture.png)


## Kedro project

As a data pipeline developer, you will interact with a Kedro project, which consists of:

* The **`conf/`** directory, which contains configuration for the project, such as data catalog configuration, parameters, etc.
* The **`src`** directory, which contains the source code for the project, including:
* The **`pipelines`** directory, which contains the source code for your pipelines.
* **`settings.py`** file contains the settings for the project, such as library component registration, custom hooks registration, etc. All the available settings are listed and explained in the [project settings chapter](../kedro_project_setup/settings.md).
* **`pipeline_registry.py`** file defines the project pipelines, i.e. pipelines that can be run using `kedro run --pipeline`.
* **`__main__.py`** file serves as the main entry point of the project in [package mode](../tutorial/package_a_project.md#package-a-kedro-project).
* **`pyproject.toml`** identifies the project root by providing project metadata, including:
* `package_name`: A valid Python package name for your project package.
* `project_name`: A human readable name for your project.
* `kedro_init_version`: Kedro version with which the project was generated.

## Kedro framework

Kedro framework serves as the interface between a Kedro project and Kedro library components. The major building blocks of the Kedro framework include:

* **[`Session`](/kedro.framework.session)** is responsible for managing the lifecycle of a Kedro run.
* **[`Context`](/kedro.framework.context)** holds the configuration and Kedro's main functionality, and also serves as the main entry point for interactions with core library components.
* **[`Hooks`](/kedro.framework.hooks)** defines all hook specifications available to extend Kedro.
* **[`CLI`](/kedro.framework.cli)** defines built-in Kedro CLI commands and utilities to load custom CLI commands from plugins.

## Kedro starter

You can use a [Kedro starter](../kedro_project_setup/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice.

## Kedro library

Kedro library consists of independent units, each responsible for one aspect of computation in a data pipeline:

* **[`ConfigLoader`](/kedro.config.ConfigLoader)** provides utility to parse and load configuration defined in a Kedro project.
* **[`Pipeline`](/kedro.pipeline)** provides a collection of abstractions to model data pipelines.
* **[`Runner`](/kedro.runner)** provides an abstraction for different execution strategy of a data pipeline.
* **[`I/O`](/kedro.io)** provides a collection of abstractions to handle I/O in a project, including `DataCatalog` and many `Dataset` implementations.

## Kedro extension

You can also extend Kedro behaviour in your project using a Kedro extension, which can be a custom starter, a Python library with extra hooks implementations, extra CLI commands such as [Kedro-Viz](https://github.com/kedro-org/kedro-viz) or a custom library component implementation.

If you create a Kedro extension, we welcome all kinds of contributions. Check out our [guide to contributing to Kedro](https://github.com/kedro-org/kedro/wiki/Contribute-to-Kedro). Dataset contributions to [`kedro-datasets`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets) are the most frequently accepted, since they do not require any changes to the framework itself. However, we do not discourage contributions to any of the other [`kedro-plugins`](https://github.com/kedro-org/kedro-plugins).
1 change: 1 addition & 0 deletions docs/source/extend_kedro/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
common_use_cases
plugins
architecture_overview
```
Loading

0 comments on commit 67ccff3

Please sign in to comment.