Running Visual Speech Recognition using 3D-CNN

Please cite this paper Speaker-Independent Speech Recognition using Visual Features if you use this work in your research.

Dataset: MIRACL-VC1

Link to the dataset: https://sites.google.com/site/achrafbenhamadou/-datasets/miracl-vc1.

About the Dataset

Fifteen speakers (10 women and 5 men)
Ten words and ten phrases, each uttered 10 times
Both depth and color images

Setup environment

Setup a python environment (Python3.6 during the writing of this file) with necessary packages in requirements.txt file.

$ pip install -r requirements.txt

Pre-process Data

Create videos

To extract lip regions, combine all color images into a video to feed into VisualizeLip.py using preprocess/make_videos.py file. This expects a command-line argument, the path to data.

Code/preprocess$ python make_videos.py VSR/data

Convert video to images

The datatset will be a video so that it must be converted into a set of images. The models/VisualizeLip.py file does this. This file takes in 1 mandatory argument input where input is the input path to video file (VSR/data). The output path is hard-coded in such a way that it is compatible with the main script. Refer to the example below:

Code/models$ python VisualizeLip.py --input "VSR/data"

Make sure that the directory structure of VSR looks something like this (omitting Phrases from the data)

Predict output

Run the lip_reading.py file to predict the outputs of lip movements. Look at the example below:

Code$ python lip_reading.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Running Visual Speech Recognition using 3D-CNN

Dataset: MIRACL-VC1

About the Dataset

Setup environment

Pre-process Data

Create videos

Convert video to images

Predict output

Files

README.md

Latest commit

History

README.md

File metadata and controls

Running Visual Speech Recognition using 3D-CNN

Dataset: MIRACL-VC1

About the Dataset

Setup environment

Pre-process Data

Create videos

Convert video to images

Predict output