Skip to content

Change Data Capture (CDC) tool from any source(s) to any target

License

Notifications You must be signed in to change notification settings

DSIN-INSA-Strasbourg/Hermes

Repository files navigation

Hermes

GitHub GitHub top language Code style: black flake8
Python 3.10 tests py310 codecov py310
Python 3.11 tests py311 codecov py311
Python 3.12 tests py312 codecov py312


Change Data Capture (CDC) tool from any source(s) to any target.

Caution

⚠️ The code is considered stable enough to be evaluated but needs more testing to ensure its stability ⚠️

Features

  • Does not require any change to sources data model (e.g. no need to add a last_updated column)
  • Multi-source, with ability to set merge/aggregation constraints
  • Able to handle several data types, with link (foreign keys) between them, and to enforce integrity constraints
  • Able to transform data with Jinja filters in configuration files: no need to edit some Python code
  • Clean error handling, to avoid synchronization problems, and an optional mechanism of error remediation
  • Offer a trashbin on clients for removed data
  • Insensitive to unavailability and errors on each link (source, message bus, target)
  • Easy to extend by design. All following items are implemented as plugins:
    • Datasources
    • Attributes filters (data filters)
    • Clients (targets)
    • Messagebus
  • Changes to the datamodel are easy and safe to integrate and propagate, whether on the server or on the clients

Roadmap

  • Allow changing primary keys values safely (server and clients)
  • Add a facultative option to remediate errors by merging added/modified events of a same object in errorqueue (clients)
  • Write documentation for
    • installing
    • using
    • examples
    • developping a plugin
    • contributing to core
  • Write functional tests
  • Write more tests
  • (Maybe) Force remote primary keys in client datamodel. Requires a lot of troubleshooting to safely update "internal" attrnames and values on Dataschema primary key change: in Datasources and Errorqueue
  • (Maybe) Provide information allowing client plugins to determine whether a handler is called following the reception of an event or for a retry after an error
  • (Maybe) Reduce RAM consumption by storing the cache in a SQLite database, which would allow loading objects only on demand, and no longer systematically
  • (Maybe) Design a generic way to handle adding a client whose target already contains data
  • (Maybe) Implement data consistency check when initsync sequence is met on an already initialized client (clients)
  • (Maybe) Implement a check to ensure clients subclasses required types and attributes are set in datamodel

Contributing

Contributions are always welcome, but may take some time to be merged.

Documentation

Documentation

License

GNU GPLv3

Authors