In this lab module - we will learn to publish/consume events from Azure Event Hub with Spark Structured Streaming. The source is the curated crimes dataset in DBFS, and the sink is DBFS in Delta format.
Capture the connection string.
This step is performaned on the Databricks cluster.
The maven coordinates are-
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.6
Be sure to get the latest from here- https://docs.databricks.com/spark/latest/structured-streaming/streaming-event-hubs.html#requirements
Refer the notebook for instructions.
Unit 3. Readstream crime data from DBFS and publish events to Azure Event Hub with Spark Structured Streaming
We will read the curated Chicago crimes dataset in DBFS as a stream and pubish to Azure Event Hub using Structured Streaming. Follow instructions in the notebook and execute step by step.
We will consume events from Azure Event Hub using Structured Streaming and sink to Databricks Delta. Follow instructions in the notebook and execute step by step.
We will create an external table on the streaming events and run queries on it. Follow instructions in the notebook and execute step by step.