This mini project focuses on fine-tuning the Whisper-Tiny model by OpenAI using the German dataset from Common Voice 11. Before finetuning the model achieved 43.488% in the test dataset. Due to computational limitations, I experimented with two variants of the model:
-
Model V1 (100k Train Data):
- Trained for 5 hours with 4000 Steps
- Achieved a Word Error Rate (WER) of 31% on the test dataset
-
Model V2 (200k Train Data):
- Trained for 10 hours with 8000 Steps
- Achieved a WER of 32% on the test dataset
Computer Spec = AMD Ryzen 9 5900HS, 8GB of RAM available, RTX 3060 Laptop GPU 6GB of VRAM
You can try the model in this google colab: Demo
You can find the pre-trained models on Hugging Face:
*This project is for school project
Created and Trained by Han 2024
Helped by friends in my project group