Skip to content

High-performance and efficient Framework and Agent for creating data pipelines. The core of pipeline descriptions is based on the Configuration As Code concept and the Pkl configuration language by Apple.

License

Notifications You must be signed in to change notification settings

pipelane/pipelaner

Repository files navigation

Pipelaner

Pipelaner is a high-performance and efficient Framework and Agent for creating data pipelines. The core of pipeline descriptions is based on the Configuration As Code concept and the Pkl configuration language by Apple.

Pipelaner manages data streams through three key entities: Generator, Transform and Sink.


📖 Contents


📌 Core Entities

Generator

The component responsible for creating or retrieving source data for the pipeline. Generators can produce messages, events, or retrieve data from various sources such as files, databases, or APIs.

  • Example use case:
    Reading data from a file or receiving events via webhooks.

Transform

The component that processes data within the pipeline. Transforms perform operations such as filtering, aggregation, data transformation, or cleaning to prepare it for further processing.

  • Example use case:
    Filtering records based on specific conditions or converting data format from JSON to CSV.

Sink

The final destination for the data stream. Sinks send processed data to a target system, such as a database, API, or message queue.

  • Example use case:
    Saving data to PostgreSQL or sending it to a Kafka topic.

Basic Parameters

Parameter Type Description
name String Unique name of the pipeline element.
threads Int Number of threads for processing messages. Defaults to the value of GOMAXPROC.
outputBufferSize Int Size of the output buffer. Not applicable to Sink components.

📦 Built-in Pipeline Elements

Generators

Name Description
cmd Reads the output of a command, e.g., "/usr/bin/log" "stream --style ndjson".
kafka Apache Kafka consumer that streams Value into the pipeline.
pipelaner GRPC server that streams values via gRPC.

Transforms

Name Description
batch Forms batches of data with a specified size.
chunks Splits incoming data into chunks.
debounce Eliminates "bounce" (frequent repeats) in data.
filter Filters data based on specified conditions.
remap Reassigns fields or transforms the data structure.
throttling Limits data processing rate.

Sinks

Name Description
clickhouse Sends data to a ClickHouse database.
console Outputs data to the console.
http Sends data to a specified HTTP endpoint.
kafka Publishes data to Apache Kafka.
pipelaner Streams data via gRPC to other Pipelaner nodes.

🌐 Scalability

Single-Node Deployment

For operation on a single host:
Single Node


Multi-Node Deployment

For distributed data processing across multiple hosts:
Multi-Node

For distributed interaction between nodes, you can use:

  1. gRPC — via generators and sinks with the parameter sourceName: "pipelaner".
  2. Apache Kafka — for reading/writing data via topics.

Example configuration using Kafka:

new Inputs.Kafka {
    ...
    common {
        ...
        topics {
            "kafka-topic"
        }         
    }
}

new Sinks.Kafka {
    ...
    common {
        ...
        topics {
            "kafka-topic"
        }         
    }
}

🚀 Examples

Examples Description
Basic Pipeline A simple example illustrating the creation of a basic pipeline with prebuilt components.
Custom Components An advanced example showing how to create and integrate custom Generators, Transforms, and Sinks.

Overview

  1. 🌟 Basic Pipeline
    Learn the fundamentals of creating a pipeline with minimal configuration using ready-to-use components.

  2. 🛠 Custom Components
    Extend Pipelaner’s functionality by developing your own Generators, Transforms, and Sinks.


Each example includes clear configuration files and explanations to help you get started quickly.

💡 Tip: Use these examples as templates to customize and build your own pipelines efficiently.

🤝 Support

If you have questions, suggestions, or encounter any issues, please create an Issue in the repository.
You can also participate in discussions in the Discussions section.


📜 License

This project is licensed under the Apache 2.0 license.
You are free to use, modify, and distribute the code under the terms of the license.

About

High-performance and efficient Framework and Agent for creating data pipelines. The core of pipeline descriptions is based on the Configuration As Code concept and the Pkl configuration language by Apple.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published