Skip to content

Commit

Permalink
MC: Document opencdc.collection on OpenCDC record page (#78)
Browse files Browse the repository at this point in the history
  • Loading branch information
hariso authored Apr 18, 2024
1 parent b96a3eb commit fa9ccc7
Show file tree
Hide file tree
Showing 3 changed files with 148 additions and 42 deletions.
182 changes: 142 additions & 40 deletions docs/features/opencdc-record.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,83 +61,185 @@ When processing records in Conduit, you can always expect a similar structure to

## Metadata fields

As part of an OpenCDC record, there will be a set of fields provided that will vary depending on the connector. These fields can be common to all **OpenCDC** records as part of our standard, some related to **Conduit**, and others that will be provided by each **Connector** implementation independently. These fields can be useful to define conventions that will be then used by Conduit to expand its functionality. Notice that all these fields use a dot notation syntax to indicate what they refer to, preventing accidental clashes. Here are the ones you can find:
As part of an OpenCDC record, there will be a set of fields provided that will
vary depending on the connector. These fields can be common to all **OpenCDC**
records as part of our standard, some related to **Conduit**, and others that
will be provided by each **Connector** implementation independently. These
fields can be useful to define conventions that will be then used by Conduit to
expand its functionality. Notice that all these fields use a dot notation syntax
to indicate what they refer to, preventing accidental clashes. Here are the ones
you can find:

### OpenCDC
### `opencdc.createdAt`

- `opencdc.createdAt` can contain the time when the record was created in the 3rd party system. The expected format is a Unix timestamp in nanoseconds.
- `opencdc.readAt` can contain the time when the record was read from the 3rd party system. The expected format is a Unix timestamp in nanoseconds.
- `opencdc.version` contains the version of the OpenCDC format (e.g., "v1"). This field exists to ensure the OpenCDC format version can be easily identified in case the record gets marshaled into a different untyped format (e.g. JSON).
Contains the time when the record was created in the 3rd party system. The
expected format is a Unix timestamp in nanoseconds.

```json
```json5
{
...
// other record fields
"metadata": {
"opencdc.createdAt": "1663858188836816000",
// rest of metadata
},
// other record fields
}
```

### `opencdc.readAt`

Contains the time when the record was read from the 3rd party system. The
expected format is a Unix timestamp in nanoseconds.

```json5
{
// other record fields
"metadata": {
"opencdc.readAt": "1663858188836816000",
// rest of metadata
},
// other record fields
}
```

### `opencdc.version`

Contains the version of the OpenCDC format (e.g., "v1"). This field exists to
ensure the OpenCDC format version can be easily identified in case the record
gets marshaled into a different untyped format (e.g. JSON).

```json5
{
// other record fields
"metadata": {
"opencdc.version": "v1",
...
// rest of metadata
},
...
// other record fields
}
```

### Conduit
### `opencdc.collection`

Only available in records once they are read by a **source plugin**:
Contains the name of the collection from which the record originated and/or
where it should be written to.

- `conduit.source.plugin.name` is the name of the source plugin that created the record.
- `conduit.source.plugin.version` is the version of the source plugin that created the record.
- `conduit.source.connector.id` is the ID of the source connector that received the record.
:::note
It's up to the connector to populate this field. In other words, not all records
may have this field.
:::

```json
```json5
{
// other record fields
"metadata": {
"opencdc.collection": "employees",
// rest of metadata
},
// other record fields
}
```
### `conduit.source.plugin.name`

The name of the source plugin that created the record.


```json5
{
...
// other record fields
"metadata": {
"conduit.source.plugin.name": "builtin:file",
// rest of metadata
},
// other record fields
}
```

### `conduit.source.plugin.version`

The version of the source plugin that created the record.


```json5
{
// other record fields
"metadata": {
"conduit.source.plugin.version": "v1.0.2",
// rest of metadata
},
// other record fields
}
```

### `conduit.source.connector.id`

`conduit.source.connector.id` is the ID of the source connector that received the record.

```json5
{
// other record fields
"metadata": {
"conduit.source.connector.id": "connectorID",
"conduit.source.plugin.name": "example",
"conduit.source.plugin.version": "v1",
...
// rest of metadata
},
// other record fields
}
```
### `conduit.destination.plugin.name`

The name of the destination plugin that has written the record.

```json5
{
// other record fields
"metadata": {
"conduit.destination.plugin.name": "builtin:file",
// rest of metadata
},
...
// other record fields
}
```

Only available in records once they are written by a **destination plugin**:

- `conduit.destination.plugin.name` is the name of the destination plugin that has written the record.
- `conduit.destination.plugin.version` is the version of the destination plugin that has written the record.
### `conduit.destination.plugin.version`

```json
The version of the destination plugin that has written the record.

```json5
{
...
// other record fields
"metadata": {
"conduit.destination.plugin.name": "example",
"conduit.destination.plugin.version": "v1",
...
"conduit.destination.plugin.version": "v0.9.1",
// rest of metadata
},
...
// other record fields
}
```

When a record is sent to the [Dead-Letter Queue (DLQ)](/dead-letter-queue), you'll also see these extra fields that will give you an insight into why the record landed in the DLQ.
### `conduit.dlq.nack.error`
Contains the error that caused a record to be nacked and pushed to the [dead-letter queue (DLQ)](/dead-letter-queue).

- `conduit.dlq.nack.error` contains the error that caused a record to be nacked and pushed to the dead-letter queue.
- `conduit.dlq.nack.node.id` is the ID of the internal node that nacked the record.
### `conduit.dlq.nack.node.id`
The ID of the internal node that nacked the record.

### Connector
### Connector-specific metadata

These metadata fields will be provided by each connector implementation allowing them to add any necessary metadata. As previously mentioned, to avoid unintended conflicts of metadata keys, the convention these will follow are the same as before, indicating first the connector name that's adding them.
These metadata fields will be provided by each connector implementation allowing
them to add any necessary metadata. As previously mentioned, to avoid unintended
conflicts of metadata keys, the convention these will follow are the same as
before, indicating first the connector name that's adding them.

Taking the same [previous record example](#representation), you'll notice there is a metadata key named `file.path`, which would indicate this field was added by a `file` plugin.
Taking the same [previous record example](#representation), you'll notice there
is a metadata key named `file.path`, which would indicate this field was added
by a `file` plugin.

```json
```json5
{
...
// other record fields
"metadata": {
"file.path": "./example.in",
...
// rest of metadata
},
...
// other record fields
}
```
```
2 changes: 1 addition & 1 deletion docs/introduction/getting-started.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ If we look at `example.out` we'll see three lines that contain [OpenCDC](https:/
cat example.out | jq
```

```json
```json lines
{
"position": "Nw==",
"operation": "create",
Expand Down
6 changes: 5 additions & 1 deletion docs/introduction/vocabulary.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,8 @@ sidebar_position: 3
flows through the pipeline. It can either change the record or filter it out
based on some criteria.
* **Record** - a record represents a single piece of data that flows through a
pipeline (e.g. one database row).
pipeline (e.g. one database row).
* **Collection** - a generic term used in Conduit to describe an entity in a
3rd party system from which records are read from or to which records they are
written to. Examples are: topics (in Kafka), tables (in a database), indexes (
in a search engine), collections (in NoSQL databases), etc.

0 comments on commit fa9ccc7

Please sign in to comment.