identification-of-the-dominant-speaker-in-short-videos

Here in this project, we have tried to find out the dominant speaker in YouTube videos. Videos from YouTube has to be classified into 7 categories i.e. 6 personalities and 1 class as noise (frames when speaker is absent). 6 personalities are -

1. Data

Videos from YouTube are taken of these 6 personalities in 720p quality. Frames are extracted at 1 fps from these videos using ffmpeg.

2. Approaches

2.1 Using face embeddings

Since in every video has a unique speaker, so first we try to solve this problem using face recognition. For finding face embeddings we have used OpenFace Library.

2.2 Using Spatial Models

Since the problem is basically object detection, so we haved tried to use transfer learning for CNN pre-trained on ImageNet. We did two types of fine tuning on CNN -

2.2.1 Tuning of all layers of CNN.

Weights of pre-trained CNN has been used for initialization and parameters of all the layers has been updated.

2.2.2 Tuning of only final layer

Only the parameters of last layer of CNN has been updated while the rest of the layers has been freezed.

3. Data Augmentation

We have used data augmentation for avoiding the over-fitting of the models. We have randomly cropped frame, flip it horizontal and cropped it. We have included faces of these personalities to avoid CNN remebering the background of the frames. These faces were extracted from the OpenFace Library.

4. Performance

4.1 Performance on Fine tuning on last layer

Technique	Acc.
With data augmentation	62.92 %
without data augmentation	53.20 %

4.2 Performance on Fine tuning on all layer

Technique	Acc.
With data augmentation	69.46 %
without data augmentation	62.27 %

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
full_tuning.py		full_tuning.py
last_layer_tuning.py		last_layer_tuning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

identification-of-the-dominant-speaker-in-short-videos

1. Data

2. Approaches

2.1 Using face embeddings

2.2 Using Spatial Models

2.2.1 Tuning of all layers of CNN.

2.2.2 Tuning of only final layer

3. Data Augmentation

4. Performance

4.1 Performance on Fine tuning on last layer

4.2 Performance on Fine tuning on all layer

About

Releases

Packages

Languages

License

tomar840/identification-of-the-dominant-speaker-in-short-videos

Folders and files

Latest commit

History

Repository files navigation

identification-of-the-dominant-speaker-in-short-videos

1. Data

2. Approaches

2.1 Using face embeddings

2.2 Using Spatial Models

2.2.1 Tuning of all layers of CNN.

2.2.2 Tuning of only final layer

3. Data Augmentation

4. Performance

4.1 Performance on Fine tuning on last layer

4.2 Performance on Fine tuning on all layer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages