🤗 HF Repo •🐱 Github Repo
Switch to the docker folder and build Docker GPU image for training:
cd docker
docker compose build
Onece the building process complete, run the following command to start a Docker container and attach to it:
docker compose up -d
docker exec -it asr bash
See detail in dataset_scripts folder.
# Finetuning
python finetune.py --model_id base --streaming True --train_batch_size 64 --gradient_accumulation_steps 2 --fp16 True
# LoRA Finetuning
python finetune_lora.py --model_id large-v2 --streaming True --train_batch_size 64 --gradient_accumulation_steps 2
# Evaluation
python eval.py --model_name_or_path Oblivion208/whisper-tiny-cantonese --streaming True --batch_size 64
# LoRA Evaluation
python eval_lora.py --peft_model_id Oblivion208/whisper-large-v2-lora-mix --streaming True --batch_size 64
Note: Setting --streaming
to False
will cache acoustic features on local disk, which speeds up finetuning processes, but it increases the disk usage dramatically (almost three times of raw audio files size).
The following models are all trained and evaluated on a single RTX 3090 GPU via Vast.ai.
Model name | Parameters | Finetune Steps | Time Spend | Training Loss | Validation Loss | CER % | Finetuned Model |
---|---|---|---|---|---|---|---|
whisper-tiny-cantonese | 39 M | 3200 | 4h 34m | 0.0485 | 0.771 | 11.10 | Link |
whisper-base-cantonese | 74 M | 7200 | 13h 32m | 0.0186 | 0.477 | 7.66 | Link |
whisper-small-cantonese | 244 M | 3600 | 6h 38m | 0.0266 | 0.137 | 6.16 | Link |
whisper-small-lora-cantonese | 3.5 M | 8000 | 21h 27m | 0.0687 | 0.382 | 7.40 | Link |
whisper-large-v2-lora-cantonese | 15 M | 10000 | 33h 40m | 0.0046 | 0.277 | 3.77 | Link |
Model name | Original CER % | w/o Finetune CER % | Jointly Finetune CER % |
---|---|---|---|
whisper-tiny-cantonese | 124.03 | 66.85 | 35.87 |
whisper-base-cantonese | 78.24 | 61.42 | 16.73 |
whisper-small-cantonese | 52.83 | 31.23 | / |
whisper-small-lora-cantonese | 37.53 | 19.38 | 14.73 |
whisper-large-v2-lora-cantonese | 37.53 | 19.38 | 9.63 |
- Transformers
- Accelerate
- Datasets
- PEFT
- bitsandbytes
- librosa