Skip to content

Commit

Permalink
Improve the KFP / User Guides / Core Functions docs
Browse files Browse the repository at this point in the history
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
  • Loading branch information
thesuperzapper committed Jul 7, 2024
1 parent 833cc62 commit 4ee4c51
Show file tree
Hide file tree
Showing 13 changed files with 929 additions and 541 deletions.
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
+++
title = "Create components"
description = "Create pipelines with reusable components."
weight = 3
+++

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
+++
title = "Core Functions"
description = "Documentation for users of Kubeflow Pipelines."
description = "Learn about the core functions of Kubeflow Pipelines."
weight = 2
+++
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
+++
title = "Build a More Advanced ML Pipeline"
weight = 6
description = "Create a more advanced pipeline that leverages additional KFP features."
weight = 199
+++

{{% kfp-v2-keywords %}}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
+++
title = "Use Caching"
description = "How to use caching in Kubeflow Pipelines."
weight = 5
description = "Learn about caching in Kubeflow Pipelines."
weight = 104
+++

Kubeflow Pipelines support caching to eliminate redundant executions and improve
Expand All @@ -26,7 +26,7 @@ be marked with a green "arrow from cloud" icon.
## How to use caching

Caching is enabled by default for all components in KFP. You can disable caching
for a component by calling `.set_caching_options(False)` on a task object.
for a component by calling [`.set_caching_options(enable_caching=False)`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.PipelineTask.set_caching_options) on a task object.

```python
from kfp import dsl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
+++
title = "Interact with KFP via the CLI"
weight = 4
title = "Use the KFP CLI"
description = "Learn how to interact with Kubeflow Pipelines using the KFP CLI."
weight = 203
+++

{{% kfp-v2-keywords %}}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
+++
title = "Compile a Pipeline"
description = "Compile pipelines and components to YAML"
weight = 2
description = "Define and compile a basic pipeline using the KFP SDK."
weight = 101
+++

{{% kfp-v2-keywords %}}

To submit a pipeline for execution, you must compile it to YAML with the KFP SDK compiler:
## Overview

To [submit a pipeline for execution](/docs/components/pipelines/user-guides/core-functions/run-a-pipeline/), you must compile it to YAML with the KFP SDK compiler.

In the following example, the compiler creates a file called `pipeline.yaml`, which contains a hermetic representation of your pipeline.
The output is called an [Intermediate Representation (IR) YAML](#ir-yaml), which is a serialized [`PipelineSpec`][pipeline-spec] protocol buffer message.

```python
from kfp import dsl
from kfp import compiler
from kfp import compiler, dsl

@dsl.component
def comp(message: str) -> str:
Expand All @@ -25,9 +29,19 @@ def my_pipeline(message: str) -> str:
compiler.Compiler().compile(my_pipeline, package_path='pipeline.yaml')
```

In this example, the compiler creates a file called `pipeline.yaml`, which contains a hermetic representation of your pipeline. The output is called intermediate representation (IR) YAML. You can view an example of IR YAML on [GitHub][compiled-output-example]. The contents of the file is the serialized [`PipelineSpec`][pipeline-spec] protocol buffer message and is not intended to be human-readable.
Because components are actually pipelines, you may also compile them to IR YAML:

```python
@dsl.component
def comp(message: str) -> str:
print(message)
return message

compiler.Compiler().compile(comp, package_path='component.yaml')
```

You can find human-readable information about the pipeline in the comments at the top of the compiled YAML:
You can view an [example of IR YAML][compiled-output-example] on GitHub.
The contents of the file are not intended to be human-readable, however the comments at the top of the file provide a summary of the pipeline:

```yaml
# PIPELINE DEFINITION
Expand All @@ -40,16 +54,21 @@ You can find human-readable information about the pipeline in the comments at th
...
```

You can also compile components, as opposed to pipelines, to IR YAML:
## Type checking

```python
@dsl.component
def comp(message: str) -> str:
print(message)
return message
By default, the DSL compiler statically type checks your pipeline to ensure type consistency between components that pass data between one another.
Static type checking helps identify component I/O inconsistencies without having to run the pipeline, shortening development iterations.

compiler.Compiler().compile(comp, package_path='component.yaml')
```
Specifically, the type checker checks for type equality between the type of data a component input expects and the type of the data provided.
See [Data Types][data-types] for more information about KFP data types.

For example, for parameters, a list input may only be passed to parameters with a `typing.List` annotation.
Similarly, a float may only be passed to parameters with a `float` annotation.

Input data types and annotations must also match for artifacts, with one exception: the `Artifact` type is compatible with all other artifact types.
In this sense, the `Artifact` type is both the default artifact type and an artifact "any" type.

As described in the following section, you can disable type checking.

## Compiler arguments

Expand All @@ -63,25 +82,14 @@ The [`Compiler.compile`][compiler-compile] method accepts the following argument
| `pipeline_parameters` | `Dict[str, Any]` | _Optional_<br/>Map of parameter names to argument values. This lets you provide default values for pipeline or component parameters. You can override these default values during pipeline submission.
| `type_check` | `bool` | _Optional_<br/>Indicates whether static type checking is enabled during compilation.<br/>

## Type checking

By default, the DSL compiler statically type checks your pipeline to ensure type consistency between components that pass data between one another. Static type checking helps identify component I/O inconsistencies without having to run the pipeline, shortening development iterations.

Specifically, the type checker checks for type equality between the type of data a component input expects and the type of the data provided. See [Data Types][data-types] for more information about KFP data types.

For example, for parameters, a list input may only be passed to parameters with a `typing.List` annotation. Similarly, a float may only be passed to parameters with a `float` annotation.

Input data types and annotations must also match for artifacts, with one exception: the `Artifact` type is compatible with all other artifact types. In this sense, the `Artifact` type is both the default artifact type and an artifact "any" type.

As described in the following section, you can disable type checking.

## IR YAML

The IR YAML is an intermediate representation of a compiled pipeline or component. It is an instance of the [`PipelineSpec`][pipeline-spec] protocol buffer message type, which is a platform-agnostic pipeline representation protocol. It is considered an intermediate representation because the KFP backend compiles `PipelineSpec` to [Argo Workflow][argo-workflow] YAML as the final pipeline definition for execution.
The IR YAML is an intermediate representation of a compiled pipeline or component.
It is an instance of the [`PipelineSpec`][pipeline-spec] protocol buffer message type, which is a platform-agnostic pipeline representation protocol.
It is considered an intermediate representation because the KFP backend compiles `PipelineSpec` to [Argo Workflow][argo-workflow] YAML as the final pipeline definition for execution.

Unlike the v1 component YAML, the IR YAML is not intended to be written directly.

While IR YAML is not intended to be easily human readable, you can still inspect it if you know a bit about its contents:
While IR YAML is not intended to be easily human-readable, you can still inspect it if you know a bit about its contents:

| Section | Description | Example |
|-------|-------------|---------|
Expand Down
Loading

0 comments on commit 4ee4c51

Please sign in to comment.