Skip to content

Commit

Permalink
fix unrecognized relative links
Browse files Browse the repository at this point in the history
  • Loading branch information
rpietzsch committed Feb 28, 2024
1 parent c57eee4 commit 2cddcf5
Showing 1 changed file with 21 additions and 21 deletions.
42 changes: 21 additions & 21 deletions docs/build/kafka-consumer/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,18 +53,18 @@ In Create new item window, select Kafka Consumer (Receive Messages) and click Ad

Configure the Kafka Consumer according to the topic that shall be consumed:

- **Bootstrap Server** - URL of the Kafka broker including the port number (commonly port ´9092)
- **Security Protocol** - Security mechanism used for authentication
- **Topic** - Name / ID of the topic where messages are published
- **Advanced Section**
- **Messages Dataset** - A dataset (XML/JSON) where messages can be written to. Leave this field empty to output the messages as entities (see below).
- **SASL** authentication settings as provided by your Kafka broker
- **Auto Offset Reset** - Consumption starts either at the earliest offset or the latest offset.
- **Consumer Group Name** - Consumer groups can be used to distribute the load of messages (partitions) between multiple consumers of the same group (c.f. [Kafka Concepts](https://docs.confluent.io/platform/current/clients/consumer.html#concepts)).
- **Client Id** - An optional identifier of the client which is communicated to the server. When this field is empty, the plugin defaults to `DNS:PROJECT_ID:TASK_ID`.
- **Local Consumer Queue Size** - Maximum total message size in kilobytes that the consumer can buffer for a specific partition. The consumer will stop fetching from the partition if it hits this limit. This helps prevent consumers from running out of memory.
- **Message Limit** - The maximum number of messages to fetch and process in each run. If `0` or less, all messages will be fetched.
- **Disable Commit** Setting this to `true` will disable committing messages after retrival. This means you will receive the same messages on the next execution (for testing, development, or debugging).
- **Bootstrap Server** - URL of the Kafka broker including the port number (commonly port ´9092)
- **Security Protocol** - Security mechanism used for authentication
- **Topic** - Name / ID of the topic where messages are published
- **Advanced Section**
- **Messages Dataset** - A dataset (XML/JSON) where messages can be written to. Leave this field empty to output the messages as entities (see below).
- **SASL** authentication settings as provided by your Kafka broker
- **Auto Offset Reset** - Consumption starts either at the earliest offset or the latest offset.
- **Consumer Group Name** - Consumer groups can be used to distribute the load of messages (partitions) between multiple consumers of the same group (c.f. [Kafka Concepts](https://docs.confluent.io/platform/current/clients/consumer.html#concepts)).
- **Client Id** - An optional identifier of the client which is communicated to the server. When this field is empty, the plugin defaults to `DNS:PROJECT_ID:TASK_ID`.
- **Local Consumer Queue Size** - Maximum total message size in kilobytes that the consumer can buffer for a specific partition. The consumer will stop fetching from the partition if it hits this limit. This helps prevent consumers from running out of memory.
- **Message Limit** - The maximum number of messages to fetch and process in each run. If `0` or less, all messages will be fetched.
- **Disable Commit** Setting this to `true` will disable committing messages after retrival. This means you will receive the same messages on the next execution (for testing, development, or debugging).

![Configuration options](configure-kafka-consumer.png)<!-- 24.1 -->

Expand All @@ -74,7 +74,7 @@ There are two main modes how the consumer handles received messages: either the

### Write Messages to a Dataset

In order to write the received messages to a dataset, the option **Messages Dataset** needs to be set. Only JSON and XML message formats are supported in this mode. So depending on the message format a [JSON](../../deploy-and-configure/configuration/dataintegration/plugin-reference/#json) or [XML Dataset](../../deploy-and-configure/configuration/dataintegration/plugin-reference/#xml) needs to be created and configured as the **Messages Dataset**.
In order to write the received messages to a dataset, the option **Messages Dataset** needs to be set. Only JSON and XML message formats are supported in this mode. So depending on the message format a [JSON](../../deploy-and-configure/configuration/dataintegration/plugin-reference/index.md#json) or [XML Dataset](../../deploy-and-configure/configuration/dataintegration/plugin-reference/index.md#xml) needs to be created and configured as the **Messages Dataset**.

![Choose a dataset according to the message format](configure-message-dataset.png)<!-- 24.1 -->

Expand All @@ -86,21 +86,21 @@ To execute the Kafka Consumer it needs to be placed inside a Workflow. The messa

In the "message streaming mode" (**Messages Dataset** is not set) the received messages will be generated as entities and forwarded to the subsequent operator in the workflow. This mode is not limited to any message format. The generated message entities will have the following flat schema:

- **key** — the optional key of the message,
- **content** — the message itself as plain text,
- **offset** — the given offset of the message in the topic,
- **ts-production** — the timestamp when the message was written to the topic,
- **ts-consumption** — the timestamp when the message was consumed from the topic.
- **key** — the optional key of the message,
- **content** — the message itself as plain text,
- **offset** — the given offset of the message in the topic,
- **ts-production** — the timestamp when the message was written to the topic,
- **ts-consumption** — the timestamp when the message was consumed from the topic.

Connect the output of Kafka Consumer inside a Workflow to a tabular dataset (e.g. a [CSV Dataset](../../deploy-and-configure/configuration/dataintegration/plugin-reference/#csv)) or directly to a transformation task.
Connect the output of Kafka Consumer inside a Workflow to a tabular dataset (e.g. a [CSV Dataset](../../deploy-and-configure/configuration/dataintegration/plugin-reference/index.md#csv)) or directly to a transformation task.

![](demo-wf-2.png)<!-- 24.1 -->

The message content is captured as plain text. In order to process complex message content, the `content` path needs to be parsed with operators such as [Parse JSON](../../deploy-and-configure/configuration/dataintegration/plugin-reference/#parse-json) or [Parse XML](../../deploy-and-configure/configuration/dataintegration/plugin-reference/#parse-xml) to process the message content in a transformation.
The message content is captured as plain text. In order to process complex message content, the `content` path needs to be parsed with operators such as [Parse JSON](../../deploy-and-configure/configuration/dataintegration/plugin-reference/index.md#parse-json) or [Parse XML](../../deploy-and-configure/configuration/dataintegration/plugin-reference/index.md#parse-xml) to process the message content in a transformation.

![](demo-wf-3.png)<!-- 24.1 -->

Any modifications to the message set, such as filtering, can be done prior to parsing the content. One could for example remove duplicates (according to the message key) from the messages by using the [Distinct-by task](../../deploy-and-configure/configuration/dataintegration/plugin-reference/#distinct-by).
Any modifications to the message set, such as filtering, can be done prior to parsing the content. One could for example remove duplicates (according to the message key) from the messages by using the [Distinct-by task](../../deploy-and-configure/configuration/dataintegration/plugin-reference/index.md#distinct-by).

![](demo-wf-4.png)<!-- 24.1 -->

Expand Down

0 comments on commit 2cddcf5

Please sign in to comment.