Video Streaming Analytics with Apache Spark

Introduction

Consider a live event like an NBA game. The video is captured by cameras installed on a Basketball Arena, and it makes its way to the viewer's OTT devices: TVs, computers, smartphones through the Live Video Streaming System. Live Video Streaming Systems implements such services as ingest, transcoding, packaging, distribution, server-side ad insertion, etc. Each service in Video Streaming System delivers logs to the Video Analytics.


Diagram1: Video Streaming Flow

Video Streaming Analytics platforms help overcome streaming problems, understand the audience better and optimize the viewer’s satisfaction by ensuring higher video streaming quality. Video Streaming Platforms can differ from one to another. In the current project, we create the Analytics for the Video Streaming platforms that have the following components in common: Video Processing, Media Packaging, Media Storage, and Video Distribution.

Objectives

The main objective of this project is to create a reusable component that collects and analyzes video streaming data in real-time with the latency of 2 minutes. This project will produce the following KPIs:

The main objective of this project is to collect, analyze and report the following video KPIs:

Views - indicates how many times your video has been consumed
Unique views - represent the actual number of people who watched the video or live stream
Session duration - the average time a user watched the video
Start-up time - the time it takes to start playing the video
Video buffering - describes the time it takes to (pre-)load the data that is needed to play a video
Geolocation - the geographical popularity of the video
Device data - discover the OS, browsers, video players that are used by the audience
Cdn QoS - the quality of service of the CDN.

Design

The project runs on top of Apache Spark Streaming. Apache Kafka serves as a data source for Apache Spark Streaming. The processed data is written to the File Sink by using Append Mode.


Diagram2: Video Analytics Flow

For more information, view the following design document

How-To

Scala Project

To consume this project from another Apache Spark application, you can build it by using sbt:

mkdir video-analytics
cd video-analytics
git clone git@github.com:dimastatz/video-streaming-analytics.git
cd video-streaming-analytics/data-process/processing/flumenz
sbt assembly

The command sbt assembly builds an uber jar. Uber jar can be used from another Spark Application. You can also try sbt testCoverage to run all unit tests at once. The code coverage of this project is above 90%, and you can examine almost everything from the unit tests.

Docker Compose

As a prerequisites you should install docker desktop and docker compose. After installing the docker software, run docker-compose-process.yml. This docker-compose file starts Apache Kafka and Apache Spark Streaming containers. You can feed data the Apache Kafka to see how it works.

Name		Name	Last commit message	Last commit date
Latest commit History 220 Commits
.vscode		.vscode
data-analysis		data-analysis
data-collect		data-collect
data-process		data-process
docs		docs
video-ingest		video-ingest
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose-airflow.yml		docker-compose-airflow.yml
docker-compose-process.yml		docker-compose-process.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Streaming Analytics with Apache Spark

Introduction

Objectives

Design

How-To

Scala Project

Docker Compose

About

Releases

Packages

Languages

License

dimastatz/video-streaming-analytics

Folders and files

Latest commit

History

Repository files navigation

Video Streaming Analytics with Apache Spark

Introduction

Objectives

Design

How-To

Scala Project

Docker Compose

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages