Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Connect & MongoDB Database #74

Merged
merged 13 commits into from
May 30, 2024
63 changes: 63 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,70 @@ This has only been tested with Confluent Cloud but technically all SASL authenti

[Back to top](#toc)

# MongoDB Integration

## Description and Configuration

To sink streamed kafka topic data to a MongoDB database, a kafka connect and MongoDB instance can be deployed for the ODE. By running the provided docker compose [file](./docker-compose-mongo.yml) the following topics will be streamed to MongoDB:

- OdeRawEncodedBSMJson
- OdeBsmJson
- OdeRawEncodedMAPJson
- OdeMapJson
- OdeRawEncodedSPATJson
- OdeSpatJson
- OdeRawEncodedTIMJson
- OdeTimJson
- OdeRawEncodedPsmJson
- OdePsmJson

The configuration that defines this is in the jpo-s3-deposit submodule [here](jpo-s3-deposit\mongo-connector\connect_start.sh). This script is attached to the `connect` container as a volume and if you would like to sink different topics then feel free to make a copy of the `connect_start.sh` script and attach it to the `connect` container to the following path: `/scripts/connect_start.sh`.

## Environment variables

### Purpose & Usage

- The `MONGO_IP` environment variable is used to define the IP address of the MongoDB container. This can be configured to use a remote MongoDB instance instead of using the provided docker deployed container.

- The `MONGO_DB_NAME` environmental variable defines the name of the DB created in MongoDB. This variable is used for both configuring user permission access as well as a destination for the connectors defined in the `connect` container.

- The `MONGO_ADMIN_DB_USER` and `MONGO_ADMIN_DB_PASS` define the credentials for the `admin` MongoDB user. This user has full control of the cluster and the password must be securely set for production deployments.

- The `MONGO_ODE_DB_USER` and `MONGO_ODE_DB_PASS` define the credentials for the `ode` MongoDB user. This user has `readWrite` permissions to the `MONGO_DB_NAME` database.

- The `MONGO_URI` environmental variable contains the complete connection string used to connect to the MongoDB when creating connectors in the `connect` container.

- The `MONGO_COLLECTION_TTL` environmental variable configures the Time To Live (TTL) for created TTL indexes. Setting this value too high will result in much more storage usage.

### Values
In order to utilize Confluent Cloud:

- `MONGO_IP` must be set to the IP address of the MongoDB container. This can be left as `${DOCKER_HOST_IP}` for deployments using the provided MongoDB instance included in the docker-compose file.

- `MONGO_DB_NAME` configures the created DB name in MongoDB.

- `MONGO_ADMIN_DB_USER` configures the MongoDB admin user's name.

- `MONGO_ADMIN_DB_PASS` configures the MongoDB admin user's name. This must be changed to a more secure password for production deployments.

- `MONGO_ODE_DB_USER` configures the username of the initialized user with `readwrite` access to the initialized database.

- `MONGO_ODE_DB_PASS` configures the password of the initialized user with `readwrite` access to the initialized database.

- `MONGO_URI` defines the connection URI used by the kafka connect instance. MongoDB connection URI options are documented [here](https://www.mongodb.com/docs/manual/reference/connection-string/)

- `MONGO_COLLECTION_TTL` sets the Time To Live (TTL) for the created TTL indexes.


## Mongo Docker Compose File

There is a provided docker-compose [file](docker-compose-mongo.yml) that spins up a MongoDB instance with a kafka connect service. There is also a initialization container that configures the RBAC and replica set of the MongoDB container.

## Note

Kafka connect is being used for MongoDB in this implimentation but it can interact with many types of databases, here is further documentation for [kafka connect](https://docs.confluent.io/platform/current/connect/index.html)

[Back to top](#toc)

<!--
#########################################
Expand Down
92 changes: 92 additions & 0 deletions docker-compose-mongo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
version: '3'

include:
- docker-compose.yml

services:
mongo:
image: mongo:7
container_name: mongo
restart: always
ports:
- "27017:27017"
environment:
MONGO_INITDB_ROOT_USERNAME: ${MONGO_ADMIN_DB_USER}
MONGO_INITDB_ROOT_PASSWORD: ${MONGO_ADMIN_DB_PASS}
MONGO_INITDB_DATABASE: admin
entrypoint:
- bash
- -c
- |
openssl rand -base64 741 > /mongo_keyfile
chmod 400 /mongo_keyfile
chown 999:999 /mongo_keyfile
exec docker-entrypoint.sh $$@
command: "mongod --bind_ip_all --replSet rs0 --keyFile /mongo_keyfile"
volumes:
- mongo_data:/data/db
healthcheck:
test: |
echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
interval: 10s
start_period: 30s

mongo-setup:
image: mongo:7
container_name: mongo_setup
depends_on:
- mongo
restart: on-failure
environment:
MONGO_ADMIN_DB_USER: ${MONGO_ADMIN_DB_USER}
MONGO_ADMIN_DB_PASS: ${MONGO_ADMIN_DB_PASS}
MONGO_DB_NAME: ${MONGO_DB_NAME}
MONGO_ODE_DB_USER: ${MONGO_ODE_DB_USER}
MONGO_ODE_DB_PASS: ${MONGO_ODE_DB_PASS}
MONGO_COLLECTION_TTL: ${MONGO_COLLECTION_TTL}
entrypoint: ["/bin/bash", "setup_mongo.sh"]
volumes:
- ./scripts/mongo/setup_mongo.sh:/setup_mongo.sh
- ./scripts/mongo/create_indexes.js:/create_indexes.js


connect:
image: kafka-connect:latest
restart: always
build:
context: ./jpo-s3-deposit/mongo-connector
dockerfile: Dockerfile
ports:
- "8083:8083"
depends_on:
mongo:
condition: service_healthy
environment:
MONGO_URI: ${MONGO_URI}
MONGO_DB_NAME: ${MONGO_DB_NAME}
CONNECT_BOOTSTRAP_SERVERS: ${DOCKER_HOST_IP}:9092
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: topic.kafka-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_CONFIG_STORAGE_CLEANUP_POLICY: compact
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: topic.kafka-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_CLEANUP_POLICY: compact
CONNECT_STATUS_STORAGE_TOPIC: topic.kafka-connect-status
CONNECT_STATUS_STORAGE_CLEANUP_POLICY: compact
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_LOG4J_ROOT_LOGLEVEL: "ERROR"
CONNECT_LOG4J_LOGGERS: "org.apache.kafka.connect.runtime.rest=ERROR,org.reflections=ERROR,com.mongodb.kafka=ERROR"
CONNECT_PLUGIN_PATH: /usr/share/confluent-hub-components
volumes:
- ./jpo-s3-deposit/mongo-connector/connect_start.sh:/scripts/connect_start.sh

volumes:
mongo_data:
6 changes: 5 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ services:
ports:
- "9092:9092"
volumes:
- "${DOCKER_SHARED_VOLUME}:/bitnami"
- kafka:/bitnami
environment:
KAFKA_ENABLE_KRAFT: "yes"
KAFKA_CFG_PROCESS_ROLES: "broker,controller"
Expand Down Expand Up @@ -268,3 +268,7 @@ services:
options:
max-size: "10m"
max-file: "5"

volumes:
kafka:
{}
10 changes: 10 additions & 0 deletions sample.env
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,16 @@ RDE_TIM_GROUP=group_rde_tim
## Required if using SDX depositor module (REST interface)
SDW_API_KEY=

# Required MONGODB Variables
MONGO_IP=${DOCKER_HOST_IP}
MONGO_DB_NAME=ode
MONGO_ADMIN_DB_USER=admin
MONGO_ADMIN_DB_PASS=password
MONGO_ODE_DB_USER=ode
MONGO_ODE_DB_PASS=password
MONGO_URI=mongodb://${MONGO_ODE_DB_USER}:${MONGO_ODE_DB_PASS}@${MONGO_IP}:27017/?directConnection=true
MONGO_COLLECTION_TTL=7 # days

## Optional overrides
#SDW_DESTINATION_URL=
#SDW_GROUP_ID=
Expand Down
Loading
Loading