Skip to content

Commit

Permalink
doc: monovertex (numaproj#2193)
Browse files Browse the repository at this point in the history
Signed-off-by: Vigith Maurice <vigith@gmail.com>
  • Loading branch information
vigith authored Oct 29, 2024
1 parent eca3b0c commit 00a74df
Show file tree
Hide file tree
Showing 4 changed files with 85 additions and 2 deletions.
Binary file added docs/assets/monovertex.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
79 changes: 79 additions & 0 deletions docs/core-concepts/monovertex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# MonoVertex

MonoVertex is a simplified version of the [Pipeline](./pipeline.md). The major design idea behind MonoVertex is to simplify data-processing
in those cases where independently autoscaling vertices are not required. This means there will only be a single vertex
called the MonoVertex, which is capable of running the [Source](../user-guide/sources/overview.md), the [Sink](../user-guide/sinks/overview.md)
(with [Fallback Sink](../user-guide/sinks/fallback.md) if need be), and optionally the [Transformer](../user-guide/sources/transformer/overview.md)
together. There is no concept of Edges in MonoVertex since there is only one [Vertex](vertex.md). The MonoVertex runs the same containers run by the
[Pipeline](./pipeline.md), this means the users can switch between MonoVertex and [Pipeline](./pipeline.md) by just
changing the spec. MonoVertex’s [autoscaling](../specifications/autoscaling.md) is similar to the Source vertex of a [Pipeline](./pipeline.md),
where the vertex scales out if the pending at the Source is increasing, and scales down when the pending at the Source drops.

The major benefits of MonoVertex are as follows:

* Very high TPS can be supported because there are no ISBs (RAFT consensus) in between.
* Extremely low latency because there is no network hops between the Vertices (only one Vertex).
* Low Operational overhead, no need to maintain ISBs.

_MonoVertex - Condensed Pipeline with just one Vertex_
![monovertex.png](../assets/monovertex.png)


## Use cases of MonoVertex

There are a few scenarios where MonoVertex is the best fit, which can be best described as use-cases where “the user is
reading from the Source, doing simple transformations, and writing to a Sink”. You don't need to scale these Source,
Sink, or Map vertices by themselves. If the Sink can't keep up, you just need to add more pods. This brings a new set
of Source, Transformer, and a Sink (optionally with a backup sink). The same goes for if the Source cannot keep up
(pending is increasing), we just need to add more pods.

### A Brief History

MonoVertex was developed because we found that, there are a decent amount of applications that read from
Sources like Kafka, Pulsar, etc., and write to a User-Defined Sink (DBs, etc). Often, the transformation itself was not
required because the User-Defined Sink can do the transformation before persisting the data in to the Sink. The extra hop
through the ISB was not helping but just making things slower by adding more delays. Throughput was also getting limited to
about 50K unless larger ISBs were used.

### When not to use MonoVertex

The rule of thumb is, if you are just reading from the source and writing to a sink (with some transformation), then
you might be able to get away with MonoVertex. For all other use cases, use the full Pipeline Semantics. Below are a
few examples where you cannot use MonoVertex.

* [User-Defined Functions](../user-guide/user-defined-functions/user-defined-functions.md) are not supported, only
[Transformer](../user-guide/sources/transformer/overview.md) can be used. If you need Map support, please let us know
by creating an issue,
* If you are using the [Reduce](../user-guide/user-defined-functions/reduce/reduce.md) feature, then the full [Pipeline](./pipeline.md)
semantics is required. This is because of the need for shuffling of data.
* There are cases where you want to autoscale intermediate process nodes, especially in the case of ML workloads.
* If there is a need for custom placement of vertices (e.g., GPU nodes for Inference), then the complete pipeline spec is required.
* If you want to use the [Join](../user-guide/reference/join-vertex.md) feature, then MonoVertex will not work.

## Anatomy of MonoVertex

MonoVertex supports the same Sources, Sinks, and Transformers which are used in the Pipeline spec.

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
kind: MonoVertex
metadata:
name: simple-mono-vertex
spec:
source:
# same as the Pipeline Source
# ...
transformer: # transformer is an optional container to do any transformation to the incoming data before passing to the sink
# same as the Pipeline Transformer
# ...
sink:
# same as the Pipeline Sink
# ...
```

Please keep in mind that, moving from a MonoVertex to Pipeline does not require code change, only spec change.

## Example

A [simple example](https://raw.githubusercontent.com/numaproj/numaflow/stable/examples/21-simple-mono-vertex.yaml) using
user-defined source, transformer, and sink.
7 changes: 5 additions & 2 deletions docs/core-concepts/pipeline.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# Pipeline

The `Pipeline` represents a data processing job. The most important concept in Numaflow, it defines:
The `Pipeline` represents a data processing job (a simpler version of this is called [MonoVertex](./monovertex.md)). The
most important concept in Numaflow, it defines:

1. A list of [vertices](vertex.md), which define the data processing tasks;
1. A list of `edges`, which are used to describe the relationship between the vertices. Note an edge may go from a vertex to multiple vertices, and as of v0.10, an edge may also go from multiple vertices to a vertex. This many-to-one relationship is possible via [Join and Cycles](../user-guide/reference/join-vertex.md)
1. A list of `edges`, which are used to describe the relationship between the vertices. Note an edge may go from a vertex
to multiple vertices, and an edge may also go from multiple vertices to a vertex. This many-to-one relationship is
possible via [Join and Cycles](../user-guide/reference/join-vertex.md)

The `Pipeline` is abstracted as a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). A `Pipeline` spec looks like below.

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ nav:
- Core Concepts:
- core-concepts/pipeline.md
- core-concepts/vertex.md
- core-concepts/monovertex.md
- core-concepts/inter-step-buffer.md
- core-concepts/inter-step-buffer-service.md
- core-concepts/watermarks.md
Expand Down

0 comments on commit 00a74df

Please sign in to comment.