Designed by Agile Lab, Witboost is a versatile platform that addresses a wide range of sophisticated data engineering challenges. It enables businesses to discover, enhance, and productize their data, fostering the creation of automated data platforms that adhere to the highest standards of data governance. Want to know more about Witboost? Check it out here or contact us!
This repository is part of our Starter Kit meant to showcase Witboost's integration capabilities and provide a "batteries-included" product.
Use this template to automatically create a DAG in an Apache Airflow instance which helps in orchestrating the data flow through various other components. This DAG is uploaded to an object storage (the current provided implementation is based on AWS S3 Storage) from which the Airflow instance can retrieve it.
The component from this template must be created last since it will depend on some components that had already been created.
Refer to the Witboost Starter Kit repository for information on the Tech Adapter that can be used to deploy components created with this template.
A Template is a tool that helps create components inside a Data Mesh. Templates help establish a standard across the organization. This standard leads to easier understanding, management and maintenance of components. Templates provide a predefined structure so that developers don't have to start from scratch each time, which leads to faster development and allows them to focus on other aspects, such as testing and business logic.
For more information, please refer to the official documentation.
Workload refers to any data processing step (ETL, job, transformation etc.) that is applied to data in a Data Product. Workloads can pull data from sources external to the Data Mesh or from an Output Port of a different Data Product or from Storage Areas inside the same Data Product, and persist it for further processing or serving.
Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow’s extensible Python framework enables you to build workflows connecting with virtually any technology. A web interface helps manage the state of your workflows. Airflow is deployable in many ways, varying from a single process on your laptop to a distributed setup to support even the biggest workflows.
The main characteristic of Airflow workflows is that all workflows are defined in Python code. “Workflows as code” serves several purposes:
- Dynamic: Airflow pipelines are configured as Python code, allowing for dynamic pipeline generation.
- Extensible: The Airflow framework contains operators to connect with numerous technologies. All Airflow components are extensible to easily adjust to your environment.
- Flexible: Workflow parameterization is built-in leveraging the Jinja templating engine.
Learn more about it on the official website.
To get information on how to use this template, refer to this document.
To verify the component before deploying it along with the Data Product, the component needs to be tested against a CUE Policy defined for a Airflow Workload. This policy needs to be defined inside the Governance section of the Witboost Platform.
For more information, please refer to the official documentation.
This project is available under the Apache License, Version 2.0; see LICENSE for full details.
Witboost is a cutting-edge Data Experience platform, that streamlines complex data projects across various platforms, enabling seamless data production and consumption. This unified approach empowers you to fully utilize your data without platform-specific hurdles, fostering smoother collaboration across teams.
It seamlessly blends business-relevant information, data governance processes, and IT delivery, ensuring technically sound data projects aligned with strategic objectives. Witboost facilitates data-driven decision-making while maintaining data security, ethics, and regulatory compliance.
Moreover, Witboost maximizes data potential through automation, freeing resources for strategic initiatives. Apply your data for growth, innovation and competitive advantage.
Contact us or follow us on: