A sample application for ingesting data through TAP and Kafka

This repository contains a sample application to show how TAP services can be connected. The following figure shows the flow for a data ingestion example using Apache Kafka.

Implementation summary

An external device/system pushes data through a WebSockets (WS) application hosted in TAP. Let's call this application ws2kafka.
ws2kafka pushes recieved data into Kafka. The Kafka topic is chosen based on the WS URL to which the connection is made, so calls look like: wss://ws2kafka.some_domain.com/topic1. For this to work, Kafka must be configured to automatically create topics. ws2kafka is horizontally scalable, so you can have multiple instances.
kafka2hdfs is an application that is started with a predefined list of Kafka topics that it should track. For each topic, it ensures that a corresponding file exists and appends newly added data to it. (Since it is not safe to do multiple concurrent appends to a single HDFS file, you should avoid having multiple instances listening on the same topic.)

Instructions on how to deploy each application are included in that app's folder, with additional details. There is a small utility script create_service_instances.sh that creates service instances. For instructions on creating instances using the TAP console, go here

The provided ws2kafka application is very simple and does not provide authorization. If you need something more advanced, or you just feel adventurous, you may consider using Gateway, but note that it enforces some Kafka messages format.

A handy feature of this pipeline is that you can replace one part of it with your own. The common part is Kafka and its topics.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
docs		docs
kafka2hdfs		kafka2hdfs
license_checker		license_checker
ws2kafka		ws2kafka
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
ThirdPartyLicenses		ThirdPartyLicenses
create_service_instances.sh		create_service_instances.sh
deploymentOnTap.txt		deploymentOnTap.txt
pack.sh		pack.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A sample application for ingesting data through TAP and Kafka

Implementation summary

About

Releases

Packages

Contributors 10

Languages

License

trustedanalytics/ingestion-ws-kafka-hdfs

Folders and files

Latest commit

History

Repository files navigation

A sample application for ingesting data through TAP and Kafka

Implementation summary

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages