Structured Streaming Application

It is a reference application (which we will constantly improve) showing how to easily leverage and integrate Spark Structured Streaming, Apache Cassandra, and Apache Kafka for streaming computations.

Sample Use Case

We need to calculate streaming Word Count.

Clone the repo

git clone https://github.com/knoldus/structured-streaming-application.git
cd structured-streaming-application

Build the code

If this is your first time running SBT, you will be downloading the internet.

cd structured-streaming-application
sbt clean compile

Setup - 4 Steps

1.Download the latest Cassandra and open the compressed file.

2.Start Cassandra - you may need to prepend with sudo, or chown /var/lib/cassandra. On the command line:

./apache-cassandra-{version}/bin/cassandra -f

3.Download Kafka 0.10.2.1

4.Start the Kafka Server

cd kafka_2.11-0.10.2.1
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

Run

From Command Line

1.Set Environment Variables. Eg,

export BOOTSTRAP_SERVERS_CONFIG="localhost:9092"
export TOPIC="knolx"
export CASSANDRA_HOSTS="localhost"
export CASSANDRA_KEYSPACE="knolx"
export SPARK_MASTER="local"
export SPARK_APP_NAME="knolx"
export CHECKPOINT_DIR="/tmp/knolx"

2.Start Structured Streaming Application

cd /path/to/structured-streaming-application
sbt run
Multiple main classes detected, select one to run:
    
 [1] knolx.kafka.DataStreamer
 [2] knolx.spark.StructuredStreamingWordCount
    
Enter number: 2

3.Start the Kafka data feed In a second shell run:

cd /path/to/structured-streaming-application
sbt run
Multiple main classes detected, select one to run:

 [1] knolx.kafka.DataStreamer
 [2] knolx.spark.StructuredStreamingWordCount

Enter number: 1

After a few seconds you should see data by entering this in the cqlsh shell:

cqlsh> select * from wordcount;

This confirms that data from the app has published to Kafka, and the data is streaming from Spark to Cassandra.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docker		docker
project		project
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structured Streaming Application

Sample Use Case

Clone the repo

Build the code

Setup - 4 Steps

Run

From Command Line

About

Releases

Packages

Languages

License

NashTech-Labs/structured-streaming-application

Folders and files

Latest commit

History

Repository files navigation

Structured Streaming Application

Sample Use Case

Clone the repo

Build the code

Setup - 4 Steps

Run

From Command Line

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages