In this code pattern, we will build a Scala app that uses Akka to implement a WebSockets endpoint which streams data to a Db2 Event Store database. For our data, we'll use online retail order details in CSV format. We'll use Jupyter notebooks with Scala and Brunel to visualize the Event Store data.
When the reader has completed this code pattern, they will understand how to:
- Implement WebSockets with Scala and Akka
- Use Alpakka Flows for CSV parsing and inserts into Event Store
- Use Jupyter notebooks and Scala to interact with Event Store
- Use Spark SQL and Brunel visualizations to analyze the data
- Setup the database with a Jupyter notebook.
- Submit CSV data via WebSockets.
- Use Akka and Alpakka to transform the data and feed it into Event Store.
- Present the data with Brunel visualizations in a Jupyter notebook.
- Clone the repo
- Install IBM Db2 Event Store Developer Edition
- Run the database setup notebook
- Run the Scala app
- Feed in data
- Visualize the data
Clone the db2-event-store-akka-streams
repo locally. In a terminal, run:
git clone https://github.com/IBM/db2-event-store-akka-streams
Install IBM® Db2® Event Store Developer Edition on Mac, Linux, or Windows by following the instructions here.
Note: This code pattern was developed with EventStore-DeveloperEdition 1.1.4
The git repo includes a Jupyter notebook which is used to setup the database for this code pattern. The notebook also demonstrates some basics of using Scala with Event Store, such as:
- Create a database
- Create a table
- Query a table
Use the Db2 Event Store UI to create and run the notebook.
- From the upper-left corner
☰
drop down menu, selectMy Notebooks
. - Click on
add notebooks
. - Select the
From File
tab. - Provide a name.
- Click
Choose File
and navigate to thenotebooks
directory in your cloned repo. Open the fileonline_retail_database_setup.ipynb
. - Scroll down and click on
Create Notebook
.
The new notebook is now open and ready for execution.
- Edit the
host
constant in the first code cell. You will need to enter your host's IP address here. - Run the notebook using the menu
Cell ▷ Run all
or run the cells individually with the play button.
The repo contains an sbt project to run the Scala/Akka app.
-
Install sbt from here.
-
Run the app:
cd db2-event-store-akka-streams sbt run
The app creates a WebSockets endpoint at ws://localhost:8080/websocket/orderitem
.
Using Akka HTTP, very little code is needed to set up WebSockets. We also put a simple message at /
for folks that hit server URL.
With Akka Streams and Alpakka, we have an elegant way to:
- Parse the CSV data
- Map the CSV data using headers
- Transform the data into Spark SQL Rows
- Divert negative quantity order items into our cancellations EventStoreSink
- Feed remaining orders into our EventStoreSink table
OnlineRetailOrderDetails
Note: EventStoreSink and EventStoreFlow are not currently part of the Alpakka distribution. They are provided in this code pattern's github repo.
Now that the app is running, data can be fed in with WebSockets.
For example data, use the file data/online_retail_2010_12.csv from your cloned repo. This CSV file contains a header that is used for mapping the columns. You can feed in one or more rows as CSV text (no header) or feed in an entire CSV file (with header). Example header and order details:
InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850,United Kingdom
536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850,United Kingdom
You should be able to use your favorite WebSockets test client (or write one). For example, Advanced REST Client (ARC) supports testing WebSockets with text or file input.
- Install ARC from here.
- Run the ARC app.
- Select
Socket
in the left sidebar menu (if the menu is hidden use the upper-left corner☰
drop down menu). - Enter
ws://localhost:8080/websocket/orderitem
as theSocket URL
and hitCONNECT
. - To send text:
- Click on the
Text
tab. - Paste one or more lines of text from the example CSV file into
Message to send
. - Press
SEND
.
- Click on the
- To send a file:
- Click on the
File
tab. - Click
SELECT FILE
or use drag-and-drop to select an example CSV file. - Press
SEND
.
- Click on the
- Direction out will show when/what you sent, and direction in will show the response from your server.
The git repo includes a Jupyter notebook which is uses Spark SQL and Brunel visualizations to manipulate and present the data.
Use the Db2 Event Store UI to create and run the notebook.
- From the upper-left corner
☰
drop down menu, selectMy Notebooks
. - Click on
add notebooks
. - Select the
From File
tab. - Provide a name.
- Click
Choose File
and navigate to thenotebooks
directory in your cloned repo. Open the fileonline_retail_orders.ipynb
. - Scroll down and click on
Create Notebook
. The new notebook is now open and ready for execution.
- Edit the
host
constant in the first code cell. You will need to enter your host's IP address here. - Run the notebook using the menu
Cell ▷ Run all
or run the cells individually with the play button.
Near the top of the notebook we use simple Spark SQL output to help you verify that your data got into Event Store. As you feed in data you can run these cells over and over if you'd like to see the current count or your most recent events.
Next we do some aggregation and manipulation. For example, the notebook:
- Calculates gross sales (UnitPrice * Quantity)
- Uses number and date formatting
- Extracts hour-of-the-day from timestamps
- Aggregates the data using these calculated attributes
- Aggregates to get Invoice counts (multiple events per invoice)
- Uses various group-by summaries
Spark SQL and %%dataframe cell magic makes it easy to show a nice table for Top-10 (by units or gross sales). Brunel makes it easy to show graphically with interactive charts and maps. Here are some examples:
Run the code pattern (or cheat and look at the sample output notebook) to see these examples live -- and more.
See the notebook with example output and interactive charts here.
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.