Skip to content

Structured Streaming is a reference application showing how to easily integrate structured streaming Apache Spark Structured Streaming, Apache Cassandra and Apache Kafka for fast, structured streaming computations on data.

License

Notifications You must be signed in to change notification settings

NashTech-Labs/structured-streaming-application

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Structured Streaming Application

It is a reference application (which we will constantly improve) showing how to easily leverage and integrate Spark Structured Streaming, Apache Cassandra, and Apache Kafka for streaming computations.

Sample Use Case

We need to calculate streaming Word Count.

Clone the repo

git clone https://github.com/knoldus/structured-streaming-application.git
cd structured-streaming-application

Build the code

If this is your first time running SBT, you will be downloading the internet.

cd structured-streaming-application
sbt clean compile

Setup - 4 Steps

1.Download the latest Cassandra and open the compressed file.

2.Start Cassandra - you may need to prepend with sudo, or chown /var/lib/cassandra. On the command line:

./apache-cassandra-{version}/bin/cassandra -f

3.Download Kafka 0.10.2.1

4.Start the Kafka Server

cd kafka_2.11-0.10.2.1
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

Run

From Command Line

1.Set Environment Variables. Eg,

export BOOTSTRAP_SERVERS_CONFIG="localhost:9092"
export TOPIC="knolx"
export CASSANDRA_HOSTS="localhost"
export CASSANDRA_KEYSPACE="knolx"
export SPARK_MASTER="local"
export SPARK_APP_NAME="knolx"
export CHECKPOINT_DIR="/tmp/knolx"

2.Start Structured Streaming Application

cd /path/to/structured-streaming-application
sbt run
Multiple main classes detected, select one to run:
    
 [1] knolx.kafka.DataStreamer
 [2] knolx.spark.StructuredStreamingWordCount
    
Enter number: 2

3.Start the Kafka data feed In a second shell run:

cd /path/to/structured-streaming-application
sbt run
Multiple main classes detected, select one to run:

 [1] knolx.kafka.DataStreamer
 [2] knolx.spark.StructuredStreamingWordCount

Enter number: 1

After a few seconds you should see data by entering this in the cqlsh shell:

cqlsh> select * from wordcount;

This confirms that data from the app has published to Kafka, and the data is streaming from Spark to Cassandra.

About

Structured Streaming is a reference application showing how to easily integrate structured streaming Apache Spark Structured Streaming, Apache Cassandra and Apache Kafka for fast, structured streaming computations on data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published