Skip to content

MambaETL v1.0.0

Compare
Choose a tag to compare
@smallgod smallgod released this 21 Dec 09:40
· 360 commits to main since this release

Dear community,

OpenMRS MambaETL core Release v1.0.0 is now out and out of stealth mode 🎉 A big thanks to all the contributors: Derrick Baluku (UgandaEMR), and the team at UCSF (Laureen Omare, Amoso Laboso, Arthur D. Mugume, Eudson Bambo).

The project is currently found here: https://github.com/UCSF-IGHS/openmrs-module-mamba-core

MambaETL has been undergoing rigorous testing and feature enhancements the past several months, however we still need your help to try it out yourself.

How you can help: 
As we work out the necessary steps to move this work to the OpenMRS community repositories, we invite the community especially those interested in reporting and ETL to help us test this out.

A quick start reference module has been developed for your reference on how to use MambaETL in your environment. Fork it and in less than 15 minutes get Mamba up and running on your dataset.

Please find the quick start reference module here: https://github.com/UCSF-IGHS/openmrs-module-ohri-mamba

What is MambaETL?

MambaETL (or simply known as Mamba) is an OpenMRS (Open Electronic Medical Records System) implementation of data Extraction, Loading and Transforming (ETL) of data into a more denormalised format for faster data retrieval and analysis.

OpenMRS stores patient observational data in a long format. Essentially, for each encounter type for a given patient, multiple rows are saved into the OpenMRS Obs table. Sometimes as many as 50 or more rows saved for a single encounter in just the Obs table. This means that the Obs table quickly grows to millions of records in fairly sized facilities making reporting and any analysis on such data incredibly slow and difficult.

You can check out the [readme file](https://github.com/UCSF-IGHS/openmrs-module-mamba-core#) of the project for more details.

And also checkout the documentation here.

What is the MambaETL offering and what is new in Release v1.0.0?

The service offering below is just a sneak peek preview of the main highlights and doesn't include everything MambaETL v1.0.0 offers.

  1. Automated flattening of the OpenMRS obs/encounter data

    Out of the box, in this version you can flatten all or some of your obs/encounter data for a more wide data format that is easier to report with.

    The figure below depicts this concept.

    Untitled

  2. Automate flattening of the OpenMRS obs groups data

    Just like the obs data, obs groups are also automatically flattened into reporting friendly data format.

  3. Separate reporting data from Transactional data

    You will be able to specify if you want to keep the reporting (MambaETL generated) data is stored within the same schema as OpenMRS transactional data or in a separate schema. It is just a configuration away.

    The figure below shows a cut out of the pom.xml configuration file where the transactional and reporting schema names can be set.

    configure reporting schema.png

  4. Support for a MambaETL reporting API

    This version adds an extra channel to fetch your MambaETL reports out of the reporting schema. It is the MambaETL reporting rest API.

    All reporting data can be fetched out of your Mamba setup via a configurable endpoint.

    The figure below shows a postman view of one of TB reports being fetched from MambaETL

    Screenshot 2023-12-21 at 08.54.05.png

  5. Support for MySQL v5.7 and above, MariaDB

    This version of MambaETL has been widely tested on MySQL v5.7 and above, MariaDB and these are currently supported. Only SQL-compliant reporting queries are supported.

  6. Build faster

    A [template or reference module](https://github.com/UCSF-IGHS/openmrs-module-ohri-mamba) is provided. It pulls in the latest dependencies of MambaETL core and demonstrates that you can now set up your reporting module or add MambaETL support to your existing infrastructure and in less than 15mins be up and running with Mamba.

    You can now concentrate on just adding your SQL-compliant queries and Mamba will pick them up and run them.

    The figure below shows a MambaETL core maven dependency entry.

    Screenshot 2023-12-21 at 08.26.33.png

  7. Deployment of mamba build script through Liquibase

    By running a single maven command mvn clean install Mamba can now prepare an SQL-compliant build script that contains ALL reporting data ready for deployment on your target reporting schema.

    The scripts include stored procedures and functions that have been automatically generated user generated SQL queries and those from MambaETL core.

    In this version, you will be able to deploy the script through a Liquibase changeset.

    Figures below show the build folder mamba and the liquidate changeset.

    Screenshot 2023-12-21 at 08.36.18.png

    Screenshot 2023-12-21 at 08.36.41.png

  8. Improved performance and faster ETL times

    In this version, a focus on performance improvement was key. MambaETL has been benchmarked on fairly large datasets (unto 11 million obs) and shows faster ETL times on good resource usage.

    On a 16GB ram machine and roughly 3GHz processor, ETL speeds of 30mins to prepare all reporting data ready for retrieval have been recorded bringing down reporting times to seconds as opposed to multiple hours as was the case in some implementations without MambaETL.

Links:

  1. There is a detailed [readme file here](https://github.com/UCSF-IGHS/openmrs-module-mamba-core#) for implementers on how to user MambaETL
  2. A detailed document on [MambaETL here](https://www.notion.so/MambaETL-Documentation-v1-0-3f0467b435744e34a261049383c5e4ef?pvs=21)