Consider a live event like an NBA game. The video is captured by cameras installed on a Basketball Arena, and it makes its way to the viewer's OTT devices: TVs, computers, smartphones through the Live Video Streaming System. Live Video Streaming Systems implements such services as ingest, transcoding, packaging, distribution, server-side ad insertion, etc. Each service in Video Streaming System delivers logs to the Video Analytics.
Diagram1: Video Streaming Flow |
Video Streaming Analytics platforms help overcome streaming problems, understand the audience better and optimize the viewer’s satisfaction by ensuring higher video streaming quality. Video Streaming Platforms can differ from one to another. In the current project, we create the Analytics for the Video Streaming platforms that have the following components in common: Video Processing, Media Packaging, Media Storage, and Video Distribution.
The main objective of this project is to create a reusable component that collects and analyzes video streaming data in real-time with the latency of 2 minutes. This project will produce the following KPIs:
The main objective of this project is to collect, analyze and report the following video KPIs:
- Views - indicates how many times your video has been consumed
- Unique views - represent the actual number of people who watched the video or live stream
- Session duration - the average time a user watched the video
- Start-up time - the time it takes to start playing the video
- Video buffering - describes the time it takes to (pre-)load the data that is needed to play a video
- Geolocation - the geographical popularity of the video
- Device data - discover the OS, browsers, video players that are used by the audience
- Cdn QoS - the quality of service of the CDN.
The project runs on top of Apache Spark Streaming. Apache Kafka serves as a data source for Apache Spark Streaming. The processed data is written to the File Sink by using Append Mode.
Diagram2: Video Analytics Flow |
For more information, view the following design document
To consume this project from another Apache Spark application, you can build it by using sbt:
mkdir video-analytics
cd video-analytics
git clone git@github.com:dimastatz/video-streaming-analytics.git
cd video-streaming-analytics/data-process/processing/flumenz
sbt assembly
The command sbt assembly builds an uber jar. Uber jar can be used from another Spark Application. You can also try sbt testCoverage to run all unit tests at once. The code coverage of this project is above 90%, and you can examine almost everything from the unit tests.
As a prerequisites you should install docker desktop and docker compose. After installing the docker software, run docker-compose-process.yml. This docker-compose file starts Apache Kafka and Apache Spark Streaming containers. You can feed data the Apache Kafka to see how it works.