Communication is important for everyone, but people in the Deaf and Mute (D&M) community often struggle because many others don't know sign language. This project aims to help by creating a real-time Sign Language Interpreter that turns American Sign Language (ASL) gestures into text and speech, making it easier for them to connect with others.
- Abstract
- Project Description
- Project Structure
- Installation
- Dataset Collection
- Dataset Creation
- Model Training
- Real-Time Interpretation
- Results
- Project-poster
- Future Work
- Contributing
- License
- Contact
- Credits
Sign language is a crucial communication tool for the Deaf and Mute (D&M) community. However, since most people do not understand sign language and interpreters are not always available, there is a need for a reliable method to translate sign language into text and speech. This project presents a real-time system that uses computer vision and machine learning techniques to interpret American Sign Language (ASL) alphabets and numbers. By leveraging MediaPipe for hand landmark detection and a Random Forest classifier for gesture recognition, the system achieves high accuracy and provides real-time feedback, including audio output corresponding to the recognised gesture.
American Sign Language (ASL) is widely used within the Deaf and Mute community as a means of communication. Given the challenges faced by these individuals in communicating with those who do not understand sign language, this project aims to bridge the communication gap by translating ASL gestures into text and speech in real-time.
- Sign Language Detection: Uses a webcam to capture hand gestures and identifies ASL letters and numbers.
- Hand Landmark Detection: Utilises MediaPipe to detect hand landmarks.
- Classification: A Random Forest classifier trained on self-collected data.
- Real-Time Inference: Predicts the gesture and provides text and audio feedback.
An additional feature of this project is the ability to play an audio file corresponding to the recognised gesture. For example, when the model predicts the letter "A," the system will play an audio file that says "A." This feature enhances the accessibility of the system by providing an audible output, making it useful in educational environments and communication tools.
The audio files are stored in the audios/
directory, with each file named after the corresponding letter or number (e.g., A.wav
, One.wav
).
- To create an accessible tool for real-time sign language recognition.
- To allow anyone, including those unfamiliar with ASL, to understand and communicate with D&M individuals.
- To provide a flexible, modular system that can be expanded with additional gestures and languages.
The system is trained to recognise the following ASL gestures:
Alphabets:
- A, B, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y
Numbers:
- One, Two, Three, Four, Five, Six, Seven, Eight, Nine
- Real-time Gesture Recognition: Detects and interprets ASL gestures using a webcam.
- Easy Dataset Collection: Includes scripts for capturing and labeling gesture images.
- Customisable Model: Users can extend the model to recognise additional gestures.
- Performance Visualisation: Displays metrics like confusion matrices, ROC, and Precision-Recall curves.
sign_language_interpreter/
├── audios/ # Directory containing audio files for each gesture
├── dataset/ # Directory for captured gesture data
├── artifacts/ # Directory for saved models and data artifacts
├── src/ # Source code for the project
│ ├── config.py # Configuration file with paths and constants
│ ├── data_collection.py # Script for capturing gesture images
│ ├── data_creation.py # Script for creating a dataset from images
│ ├── model_training.py # Script for training the model
│ ├── app.py # Script for running real-time inference
│ ├── utils.py # Utility functions
├── labels.txt # File containing gesture labels
├── requirements.txt # Python dependencies
├── .gitignore # Files and directories to ignore in git
└── README.md # Project documentation
The installation process involves setting up a Python environment and installing the required dependencies. The instructions below provide steps for macOS, Windows and Linux systems.
Ensure you have the following installed:
- Python 3.10+
- pip (Python package installer)
- git
-
Clone the repository:
git clone https://github.com/ACM40960/project-bhupendrachaudhary08.git cd project-bhupendrachaudhary08
-
Create a virtual environment:
python -m venv venv
- On macOS/Linux:
source venv/bin/activate
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
-
Install the dependencies:
pip install -r requirements.txt
- macOS/Linux: Ensure that you have the necessary permissions and use the
source
command to activate the virtual environment. For some Linux distributions, you may need to install additional libraries (e.g.,sudo apt-get install python3-venv
). - Windows: Make sure to use the correct path to activate the virtual environment. You may need to enable script execution by running
Set-ExecutionPolicy RemoteSigned -Scope Process
in PowerShell.
The labels.txt
file contains the ASL letters and numbers that the model will recognise. If you need to add or remove gestures, you can edit this file. You can also comment out any line by placing a # in front of it, and that line will be ignored during data collection and processing.
Current Labels:
# Alphabets
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
# Numbers
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
This project involves building a custom dataset using images captured from a webcam. The dataset includes images for both the right and left hands to improve recognition accuracy. Run the following script to capture gesture images:
python src/data_collection.py
- The script will guide you through capturing images for each label.
- Press
SPACE
to start capturing images for a label. - Switch Hands: After capturing half the images for one hand, the script will prompt you to switch to the other hand.
- Press
ESC
to skip to the next label. - Press
q
to quit the script.
The captured images will be stored in the dataset/
directory, with subfolders for each label.
After collecting the images, run the dataset creation script to extract hand landmarks:
python src/data_creation.py
This script processes the images using MediaPipe, extracts hand landmarks, and saves the processed data as a pickle file in the artifacts/
directory.
Check the artifacts/
directory for the data.pickle
file, which contains the processed dataset.
To train the Random Forest model on the processed dataset, run the following script:
python src/model_training.py
The script performs the following steps:
- Splits the Data: Separates the dataset into training and testing subsets.
- Model Training: Trains a RandomForest classifier.
- Model Evaluation: Evaluates the model using metrics such as accuracy, confusion matrices, ROC curves, and Precision-Recall curves.
- Model Saving: Saves the trained model to the
artifacts/
directory.
During training, the following plots are generated to assess the model's performance:
Once the model is trained, run the following script to start real-time gesture recognition:
python src/app.py
- The script uses your webcam to detect hand gestures in real-time.
- Confirm Letters: Press the
spacebar
to confirm a detected letter and add it to the sentence. - Create Sentences: The system allows you to construct sentences by confirming individual letters.
- Delete the Last Confirmed Letter: If you make a mistake, you can delete the last confirmed letter by pressing the
B
key. - Add Space: Press the
S
key to add a space between words.
The trained model successfully recognises the following ASL gestures:
- Alphabets: A, B, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y
- Numbers: One, Two, Three, Four, Five, Six, Seven, Eight, Nine
- Accuracy: 100% on the test set.
- AUC: 1.00 for all gestures.
- Precision-Recall: 1.00 for all gestures.
For a detailed visual overview of the project, you can view the project poster, which summarises the methodology, results, and future scope.
Future improvements to this project include:
- Expanding the Gesture Set: Adding support for more complex gestures, two-handed gestures, and dynamic gestures involving motion.
- Improving Generalisation: Collecting a larger, more diverse dataset to improve model robustness in different lighting conditions and environments.
- Integrating with Other Applications: Developing a mobile or web application to make the system more accessible in real-world scenarios.
Contributions are welcome! If you'd like to improve this project, please fork the repository and submit a pull request. Your contributions could include adding new features, improving documentation, or fixing bugs.
- Fork the repository.
- Create a new branch.
- Make your changes.
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.
For any questions or suggestions, please open an issue or contact me at sahil.chalkhure@ucdconnect.ie.
This project is in collaboration with Bhupendra Singh Chaudhary