Skip to content

dimastatz/video-streaming-analytics

Repository files navigation

Video Streaming Analytics with Apache Spark

Introduction

Consider a live event like an NBA game. The video is captured by cameras installed on a Basketball Arena, and it makes its way to the viewer's OTT devices: TVs, computers, smartphones through the Live Video Streaming System. Live Video Streaming Systems implements such services as ingest, transcoding, packaging, distribution, server-side ad insertion, etc. Each service in Video Streaming System delivers logs to the Video Analytics.

Video Streaming Analytics
Diagram1: Video Streaming Flow

Video Streaming Analytics platforms help overcome streaming problems, understand the audience better and optimize the viewer’s satisfaction by ensuring higher video streaming quality. Video Streaming Platforms can differ from one to another. In the current project, we create the Analytics for the Video Streaming platforms that have the following components in common: Video Processing, Media Packaging, Media Storage, and Video Distribution.

Objectives

The main objective of this project is to create a reusable component that collects and analyzes video streaming data in real-time with the latency of 2 minutes. This project will produce the following KPIs:

The main objective of this project is to collect, analyze and report the following video KPIs:

  • Views - indicates how many times your video has been consumed
  • Unique views - represent the actual number of people who watched the video or live stream
  • Session duration - the average time a user watched the video
  • Start-up time - the time it takes to start playing the video
  • Video buffering - describes the time it takes to (pre-)load the data that is needed to play a video
  • Geolocation - the geographical popularity of the video
  • Device data - discover the OS, browsers, video players that are used by the audience
  • Cdn QoS - the quality of service of the CDN.

Design

The project runs on top of Apache Spark Streaming. Apache Kafka serves as a data source for Apache Spark Streaming. The processed data is written to the File Sink by using Append Mode.

alt text
Diagram2: Video Analytics Flow

For more information, view the following design document

How-To

Scala Project

To consume this project from another Apache Spark application, you can build it by using sbt:

mkdir video-analytics
cd video-analytics
git clone git@github.com:dimastatz/video-streaming-analytics.git
cd video-streaming-analytics/data-process/processing/flumenz
sbt assembly

The command sbt assembly builds an uber jar. Uber jar can be used from another Spark Application. You can also try sbt testCoverage to run all unit tests at once. The code coverage of this project is above 90%, and you can examine almost everything from the unit tests.

Docker Compose

As a prerequisites you should install docker desktop and docker compose. After installing the docker software, run docker-compose-process.yml. This docker-compose file starts Apache Kafka and Apache Spark Streaming containers. You can feed data the Apache Kafka to see how it works.

About

Video Streaming Analytics platform

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published