This repo is a demo of how to use Debezium to capture changes over tables in MySQL and PostgreSQL to generate a replica in near-real-time in Snowflake. This is extensible to other databases and describes several common points about CDC, Kafka, Kafka connect, or Snowflake tools.
Miguel García and I work together on a DZone article Data Platform: Building an Enterprise CDC Solution, and as next step I publish this repo as HOWTO: Building an Enterprise CDC Solution
To facilitate the execution of the howto, the services will be deployed using docker-compose. It has a dependency of docker engine. For better compatibility, we are using the docker-compose specification 2, so a docker engine 1.10.0 or later should work.
As part of the howto, you will create a Snowflake account, and the howto guide you to create a key pair for authentication. To perform these actions, you should have an OpenSSL toolkit. Is commonly available in Linux distributions and can be installed in Windows or Mac. If you need it, you can run it inside a docker image (will be commented in the howto).
About hardware requirements, review docker engine requirements.
Well, this demo has several parts. To simplify this, it has been split into several folders in this repo. For each folder you can found a README file with explanations:
- services: relative to docker images and services
- database: sentences and scripts to run inside the local databases
- debezium: configuration and scripts to start and check the status of Debezium connectors
- snowflake: Snowflake scripts, and configuration of the Snowflake sink connector
You can see a detailed howto in DZone article HOWTO: Building an Enterprise CDC Solution that follows these steps
In this flow:
- Gray: local services
- Yellow: external resources
Well, check the README available in each folder. It includes some detail about his components and some additional scripts or functions that you can use to explore this solution.
I hope this tutorial has been helpful for you and you have enjoyed it.