(Voice Door-Lock)
Voice Fingerprint Door-Lock is a Digital-Signal-Processing WebApp that is used for Speaker-Identification and Sentence-Verification using Machine-Learning and extracted Audio-Features from voice biometrics.
- Voice Fingerprint Principles
- Project full Demo
- Dynamic E-Poster Graphs
- Project Structure
- Run The Project
- Team Members
Voice Fingerprint is one of the DSP Applications that depends on Audio Feature Extraction and Machine-Learning Model Trainig
The Audio Features are extracted from the Audio Signal using Fourier Transform and Mel-Frequency Cepstral Coefficients (MFCC) and their Delta
What is MFCC?
- A set of features used in speech recognition and audio information retrieval.
- Represent the spectral envelope of a sound by measuring the magnitude of the spectral components
- Represent the short-term power spectrum of a sound by combining a number of adjacent frequency bands
- Represent the spectral shape of a sound in the frequency domain
- Calculation Steps
- Frame the signal, and compute fourier.
- Apply mel filterbank to power spectra, sum energy bands.
- Take the log of all filterbank energies, then take Discrete Fourier Transform (DCT).
- Keep DCT coefficients 2-13, discard the rest.
- Take the logarithm of the power spectrum • Delta and Delta-Delta features are usually also appended, then applying liftering.
You can read more about MFCC here
Gaussian Mixture Model (GMM)
- GMM is an unsupervised Clustering model
- GMM is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.
- GMM is used in voice identification to identify the speaker by analyzing the spectral characteristics of the voice.
- GMM uses a set of Gaussian distributions to model the spectral characteristics of the voice.
- Each Gaussian distribution is characterized by its mean and variance.
- GMM uses an Expectation Maximization (EM) algorithm to estimate the parameters of the Gaussian distributions.
- The EM algorithm iteratively estimates the parameters of the Gaussian distributions by maximizing the likelihood of the observed data.
- The GMM model is then used to classify the speaker by comparing the spectral characteristics of the voice with the estimated parameters of the Gaussian distributions.
You can read more about GMM here
video
- Spectogram represents the Mel-Frequency Cepstral Coefficients of the user audio.
- Represents the normal distribution of mfcc feauture of each user of the team and the input user voice to represent which team fingerprint is closer to the input audio based on principles of GMM Model.
- Bar chart represents scores of gmm models to represent which score is closer to the team scores and compares them with the threshold of dissimilarity.
- Frontend takes the user audio and sends it to the backend.
- Backend extracts the audio features and sends them to the machine learning model.
- Machine learning model compares the input audio features with the team audio features in team verification step
- If the Voice Fingerprint is verified(From Registered team Users), the machine learning model compares the input audio features with the user audio features in sentence verification step.
- Door is opened only if the Voice Fingerprint(User in team) is verified and the sentence(Open The Door) is verified.
- Then Machine learning model returns the result to the backend and the backend returns the result to the frontend
- Frontend displays the result to the user and the door is opened if the result is verified.
- Frontend :
- HTML
- CSS
- JavaScript
- Backend :
- Flask (Python)
- Machine Learning Model Training
- GMM Model (Python)
- Used Libraries
- python_speech_features
- librosa
- sklearn
- Numpy
- Scipy
- Clone the project
- Open Terminal and write
cd src
pip install -r requirements.txt
flask run --reload
- Open server link in browser http://127.0.0.1:5000/