-
Notifications
You must be signed in to change notification settings - Fork 5
Home
Increasing need for insights from vast data sources has given rise to data-driven business intelligence products which build and execute complex data workflows.
A data workflow is a set of inter-dependent data-driven tasks. Simple solutions use cron
based approach which works well for simple workflows with few or no task dependencies. However, cron
fails if there are complex dependencies between tasks.
At Cognitree, we build and execute complex data workflows for our customers to gather data insights. We built an effective scheduling tool Kronos for our data pipelines which adds features on top of cron
.
Kronos is a Java based replacement for cron
to build, run and monitor complex data pipelines with flexible deployment options including embed mode. It handles dependency resolution, workflow management, failures. Kronos is built on top of Quartz and uses DAG (Directed Acyclic Graph) to manage the tasks within a workflow.
Examples of data pipelines include batch jobs, chaining multiple tasks, machine learning jobs etc.
Kronos can be compared with Oozie and Azkaban, which are targetted specifically for Hadoop workflows while Kronos is flexible and can run any workflow including big data pipelines.
The architecture is flexible and extensible with each component of the Kronos designed to be pluggable.
- Dependency Management: Define/manage dependency among tasks in a workflow.
- Dynamic: Define/modify workflow at runtime.
- Extensible: Define custom task handlers and persistence store.
- Fault Tolerant: Handle system/process faults.
- Flexible deployment model: Embed as a library or deploy in standalone or distributed mode.
What next? Head on to overview section to understand more about Kronos.