Enterprise Streaming Benchmark

This repository contains the code that I developed as part of my master’s thesis “Design of a Benchmark Concept for Data Stream Management Systems (DSMS) in the Context of Smart Factories”. The following sections give a rough overview. In case you are interested in design decisions or more details on the queries and performance results, send me a message on LinkedIn.

1. Introduction

Problem Statement:

“Currently, there does not exist a satisfying application benchmark for distributed DSMSs in the area of smart factories.”

Contributions:

Definition of a set of queries to be executed by the System Under Test (SUT).
Design and setup of the benchmark architecture.
Definition of a set of benchmark metrics to evaluate the SUT’s performance.
Provision of a basic toolkit including a data sender, validator and system setup scripts.
Provision of a prototypical reference implementation for a subset of the queries.

2. Benchmark Architecture

3. Performance Metrics

Correctness
Response Time (90th-percentile)
Single Stream Throughput (in records/s)
Number of Streams

4. Benchmark Parameters

Number of input data streams (scale factor)
Record frequency per data stream
Benchmark duration
Queries to be executed on each data stream

These parameters should be set in tools/commons/commons.conf.

5. Modules

`tools/commons`

Contains code that is used by multiple modules and the file commons.conf in which the main benchmark parameters are set.

`tools/datasender`

Contains the datasender. Kafka-specific configurations can be done in tools/datasender/datasender.conf.

`tools/validator`

Contains the validator - a streaming application which makes use of the Akka Stream Kafka Library.

`tools/configuration`

Contains setup and configuration scripts and a benchmark runner. All of them are defined with Ansible.

`tools/util`

Contains utility functions to create/delete/redistribute Kafka topics and to get current offsets in topics.

`implementation`

Partial benchmark implementation with Apache Flink for Identity Query (incoming events are written back to Kafka without modification) and Statistics Query (min, max, mean, sum and count for tumbling window of 1 second). Each query is run in a separate job to be able to execute queries in parallel but still keep the order of records.

6. Workflow

In case of a single data stream, the data sender reads the data records from a provided file (e.g. taken from here) and sends them according to the configured frequency to the Kafka input topic. The SUT consumes the records, runs the configured queries and writes the results to the Kafka output topics (one dedicated Kafka topic per query). Afterwards, the validator can read the same records from the input topics, create gold standard results and compare them to the results created by the SUT to check for correctness. Furthermore, based on the Kafka message timestamps, the 90th-percentile of response times is calculated.

In case of multiple data streams the setup is similar. Each data stream is sent to a dedicated Kafka input topic. The SUT is required to run all configured queries on all data streams and write them to the dedicated output topics. The following image shows how the Data Stream Management System executes the Identity and Statistics Query on each data stream.

7. Benchmark Execution

To run the benchmark on a cluster, it is advisable to install Ansible. The provided scripts allow installing the necessary software and running the benchmark (tools/configuration/plays/benchmark-runner.yml).

If you prefer running the modules without Ansible you can compile the whole project with sbt assembly or a specific module with sbt project [module]:assembly. The created jars can be run with java -jar /path/to/jar.jar.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
images		images
implementation/flink		implementation/flink
project		project
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enterprise Streaming Benchmark

1. Introduction

Problem Statement:

Contributions:

2. Benchmark Architecture

3. Performance Metrics

4. Benchmark Parameters

5. Modules

`tools/commons`

`tools/datasender`

`tools/validator`

`tools/configuration`

`tools/util`

`implementation`

6. Workflow

7. Benchmark Execution

About

Releases

Packages

Contributors 3

Languages

License

BenReissaus/ESB

Folders and files

Latest commit

History

Repository files navigation

Enterprise Streaming Benchmark

1. Introduction

Problem Statement:

Contributions:

2. Benchmark Architecture

3. Performance Metrics

4. Benchmark Parameters

5. Modules

tools/commons

tools/datasender

tools/validator

tools/configuration

tools/util

implementation

6. Workflow

7. Benchmark Execution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`tools/commons`

`tools/datasender`

`tools/validator`

`tools/configuration`

`tools/util`

`implementation`

Packages