This project implements a small scale speech recognition system utilizing a Residual Convolutional Neural Network (CNN) - BiGRU Acoustic Model, a Connectionist Temporal Classification (CTC) Decoder, and a KENLM Language Model for enhanced accuracy.
-
Clone the repository:
git clone --recursive https://github.com/LuluW8071/Automatic-Speech-Recognition-with-PyTorch.git
-
Install Pytorch and required dependencies under virtual environment:
pip install -r requirements.txt
Ensure you have
PyTorch
andLightning AI
installed.
Important
Before training make sure you have placed comet ml api key and project name in the environment variable file .env
.
py train.py
Customize the pytorch training parameters by passing arguments in train.py
to suit your needs:
Refer to the provided table to change hyperparameters and train configurations.
Args | Description | Default Value |
---|---|---|
-g, --gpus |
Number of GPUs per node | 1 |
-g, --num_workers |
Number of CPU workers | 8 |
-db, --dist_backend |
Distributed backend to use for training | ddp_find_unused_parameters_true |
--epochs |
Number of total epochs to run | 50 |
--batch_size |
Size of the batch | 32 |
-lr, --learning_rate |
Learning rate | 1e-5 (0.00001) |
--checkpoint_path |
Checkpoint path to resume training from | None |
--precision |
Precision of the training | 16-mixed |
py train.py
-g 4 # Number of GPUs per node for parallel gpu training
-w 8 # Number of CPU workers for parallel data loading
--epochs 10 # Number of total epochs to run
--batch_size 64 # Size of the batch
-lr 2e-5 # Learning rate
--precision 16-mixed # Precision of the training
Note
To resume training from a saved checkpoint, use:
py train.py --checkpoint_path path_to_checkpoint.ckpt
For pre-trained models and other resources, refer to the provided links. Click here to download pre trained model
This comprehensive guide should help you navigate through setting up and using the Speech Recognition system effectively. If you encounter any issues or have questions, feel free to reach out!