We have used the Google’s high-fidelity hand and finger tracking API, MediaPipe Hands, as feature extractor of hand skeletal features. Images used in this project is collected from a recently published dataset for hand gesture detection, HAGRID.
We have trained multiple Machine Learning models on the extracted features to recognize hand gestures.
3 fully connected layers consisting of 256, 128, and 128 neurons followed by ReLU activation and dam optimizer. Learning rate of 0.0005 for optimizing the model over categorical cross-entropy loss.
2 convolutional layers followed by ReLU activations and 50% dropout. The output is then fed to 3 fully connected layers consisting of 128, 64 and 6 neurons, where the first two FC layers are followed by ReLU activation and the last one by softmax activation.
Subsample has 100 items per gesture.
Subsample | Archives | Size |
---|---|---|
images | subsample |
2.5 GB |
annotations | ann_subsample |
1.2 MB |
Hand landmarks - gesture labeled
Gesture | Size | Gesture | Size |
---|---|---|---|
call |
39.1 GB | peace |
38.6 GB |
dislike |
38.7 GB | like |
38.3 GB |
fist |
38.0 GB | rock |
38.9 GB |
Hand landmarks - gesture labeled
Report can be found here: report.docx