Skip to content

Latest commit

 

History

History
36 lines (23 loc) · 1.73 KB

README.md

File metadata and controls

36 lines (23 loc) · 1.73 KB

Running Visual Speech Recognition using 3D-CNN

Please cite this paper Speaker-Independent Speech Recognition using Visual Features if you use this work in your research.

Dataset: MIRACL-VC1

Link to the dataset: https://sites.google.com/site/achrafbenhamadou/-datasets/miracl-vc1.

About the Dataset

  • Fifteen speakers (10 women and 5 men)
  • Ten words and ten phrases, each uttered 10 times
  • Both depth and color images

Setup environment

Setup a python environment (Python3.6 during the writing of this file) with necessary packages in requirements.txt file.

$ pip install -r requirements.txt

Pre-process Data

Create videos

To extract lip regions, combine all color images into a video to feed into VisualizeLip.py using preprocess/make_videos.py file. This expects a command-line argument, the path to data.

Code/preprocess$ python make_videos.py VSR/data

Convert video to images

The datatset will be a video so that it must be converted into a set of images. The models/VisualizeLip.py file does this. This file takes in 1 mandatory argument input where input is the input path to video file (VSR/data). The output path is hard-coded in such a way that it is compatible with the main script. Refer to the example below:

Code/models$ python VisualizeLip.py --input "VSR/data"

Make sure that the directory structure of VSR looks something like this (omitting Phrases from the data)

Directory Structure

Predict output

Run the lip_reading.py file to predict the outputs of lip movements. Look at the example below:

Code$ python lip_reading.py