MC: Document opencdc.collection on OpenCDC record page (#78)

ConduitIO · Apr 18, 2024 · fa9ccc7 · fa9ccc7
1 parent b96a3eb
commit fa9ccc7
Show file tree

Hide file tree

Showing 3 changed files with 148 additions and 42 deletions.
diff --git a/docs/features/opencdc-record.mdx b/docs/features/opencdc-record.mdx
@@ -61,83 +61,185 @@ When processing records in Conduit, you can always expect a similar structure to
 
 ## Metadata fields
 
-As part of an OpenCDC record, there will be a set of fields provided that will vary depending on the connector. These fields can be common to all **OpenCDC** records as part of our standard, some related to **Conduit**, and others that will be provided by each **Connector** implementation independently. These fields can be useful to define conventions that will be then used by Conduit to expand its functionality. Notice that all these fields use a dot notation syntax to indicate what they refer to, preventing accidental clashes. Here are the ones you can find:
+As part of an OpenCDC record, there will be a set of fields provided that will
+vary depending on the connector. These fields can be common to all **OpenCDC**
+records as part of our standard, some related to **Conduit**, and others that
+will be provided by each **Connector** implementation independently. These
+fields can be useful to define conventions that will be then used by Conduit to
+expand its functionality. Notice that all these fields use a dot notation syntax
+to indicate what they refer to, preventing accidental clashes. Here are the ones
+you can find:
 
-### OpenCDC
+### `opencdc.createdAt`
 
-- `opencdc.createdAt` can contain the time when the record was created in the 3rd party system. The expected format is a Unix timestamp in nanoseconds.
-- `opencdc.readAt` can contain the time when the record was read from the 3rd party system. The expected format is a Unix timestamp in nanoseconds.
-- `opencdc.version` contains the version of the OpenCDC format (e.g., "v1"). This field exists to ensure the OpenCDC format version can be easily identified in case the record gets marshaled into a different untyped format (e.g. JSON).
+Contains the time when the record was created in the 3rd party system. The
+expected format is a Unix timestamp in nanoseconds.
 
-```json
+```json5
 {
-    ...
+    // other record fields
     "metadata": {
         "opencdc.createdAt": "1663858188836816000",
+        // rest of metadata
+    },
+    // other record fields
+}
+```
+
+### `opencdc.readAt`
+
+Contains the time when the record was read from the 3rd party system. The
+expected format is a Unix timestamp in nanoseconds.
+
+```json5
+{
+    // other record fields
+    "metadata": {
         "opencdc.readAt": "1663858188836816000",
+        // rest of metadata
+    },
+    // other record fields
+}
+```
+
+### `opencdc.version`
+
+Contains the version of the OpenCDC format (e.g., "v1"). This field exists to
+ensure the OpenCDC format version can be easily identified in case the record
+gets marshaled into a different untyped format (e.g. JSON).
+
+```json5
+{
+    // other record fields
+    "metadata": {
         "opencdc.version": "v1",
-        ...
+        // rest of metadata
     },
-    ...
+    // other record fields
 }
 ```
 
-### Conduit
+### `opencdc.collection`
 
-Only available in records once they are read by a **source plugin**:
+Contains the name of the collection from which the record originated and/or
+where it should be written to.
 
-- `conduit.source.plugin.name` is the name of the source plugin that created the record.
-- `conduit.source.plugin.version` is the version of the source plugin that created the record.
-- `conduit.source.connector.id` is the ID of the source connector that received the record.
+:::note
+It's up to the connector to populate this field. In other words, not all records
+may have this field.
+:::
 
-```json
+```json5
+{
+    // other record fields
+    "metadata": {
+        "opencdc.collection": "employees",
+        // rest of metadata
+    },
+    // other record fields
+}
+```
+### `conduit.source.plugin.name`
+
+The name of the source plugin that created the record.
+
+
+```json5
 {
-    ...
+    // other record fields
+    "metadata": {
+        "conduit.source.plugin.name": "builtin:file",
+        // rest of metadata
+    },
+    // other record fields
+}
+```
+
+### `conduit.source.plugin.version`
+
+The version of the source plugin that created the record.
+
+
+```json5
+{
+    // other record fields
+    "metadata": {
+        "conduit.source.plugin.version": "v1.0.2",
+        // rest of metadata
+    },
+    // other record fields
+}
+```
+
+### `conduit.source.connector.id`
+
+`conduit.source.connector.id` is the ID of the source connector that received the record.
+
+```json5
+{
+    // other record fields
     "metadata": {
         "conduit.source.connector.id": "connectorID",
-        "conduit.source.plugin.name": "example",
-        "conduit.source.plugin.version": "v1",
-        ...
+        // rest of metadata
+    },
+    // other record fields
+}
+```
+### `conduit.destination.plugin.name`
+
+The name of the destination plugin that has written the record.
+
+```json5
+{
+    // other record fields
+    "metadata": {
+        "conduit.destination.plugin.name": "builtin:file",
+        // rest of metadata
     },
-    ...
+    // other record fields
 }
 ```
 
-Only available in records once they are written by a **destination plugin**:
 
-- `conduit.destination.plugin.name` is the name of the destination plugin that has written the record.
-- `conduit.destination.plugin.version` is the version of the destination plugin that has written the record.
+### `conduit.destination.plugin.version`
 
-```json
+The version of the destination plugin that has written the record.
+
+```json5
 {
-    ...
+    // other record fields
     "metadata": {
-        "conduit.destination.plugin.name": "example",
-        "conduit.destination.plugin.version": "v1",
-        ...
+        "conduit.destination.plugin.version": "v0.9.1",
+        // rest of metadata
     },
-    ...
+    // other record fields
 }
 ```
 
-When a record is sent to the [Dead-Letter Queue (DLQ)](/dead-letter-queue), you'll also see these extra fields that will give you an insight into why the record landed in the DLQ.
+### `conduit.dlq.nack.error` 
+Contains the error that caused a record to be nacked and pushed to the [dead-letter queue (DLQ)](/dead-letter-queue).
 
-- `conduit.dlq.nack.error` contains the error that caused a record to be nacked and pushed to the dead-letter queue.
-- `conduit.dlq.nack.node.id` is the ID of the internal node that nacked the record. 
+### `conduit.dlq.nack.node.id` 
+The ID of the internal node that nacked the record. 
 
-### Connector
+### Connector-specific metadata
 
-These metadata fields will be provided by each connector implementation allowing them to add any necessary metadata. As previously mentioned, to avoid unintended conflicts of metadata keys, the convention these will follow are the same as before, indicating first the connector name that's adding them.
+These metadata fields will be provided by each connector implementation allowing
+them to add any necessary metadata. As previously mentioned, to avoid unintended
+conflicts of metadata keys, the convention these will follow are the same as
+before, indicating first the connector name that's adding them.
 
-Taking the same [previous record example](#representation), you'll notice there is a metadata key named `file.path`, which would indicate this field was added by a `file` plugin.
+Taking the same [previous record example](#representation), you'll notice there
+is a metadata key named `file.path`, which would indicate this field was added
+by a `file` plugin.
 
-```json
+```json5
 {
-    ...
+    // other record fields
     "metadata": {
         "file.path": "./example.in",
-        ...
+        // rest of metadata
     },
-    ...
+    // other record fields
 }
-```
+```
diff --git a/docs/introduction/getting-started.mdx b/docs/introduction/getting-started.mdx
@@ -180,7 +180,7 @@ If we look at `example.out` we'll see three lines that contain [OpenCDC](https:/
 cat example.out | jq
 ```
 
-```json
+```json lines
 {
   "position": "Nw==",
   "operation": "create",

diff --git a/docs/introduction/vocabulary.mdx b/docs/introduction/vocabulary.mdx
@@ -16,4 +16,8 @@ sidebar_position: 3
   flows through the pipeline. It can either change the record or filter it out
   based on some criteria.
 * **Record** - a record represents a single piece of data that flows through a
-  pipeline (e.g. one database row).
+  pipeline (e.g. one database row).
+* **Collection** - a generic term used in Conduit to describe an entity in a
+  3rd party system from which records are read from or to which records they are
+  written to. Examples are: topics (in Kafka), tables (in a database), indexes (
+  in a search engine), collections (in NoSQL databases), etc.