Speaker Diarization using GRU in PyTorch

Installation

Conda environment

To use this conda environment, you need to install Miniconda, then run this command

conda env create -f environment.yml

Install OpenVINO (Optional)

If you would like to use OpenVINO for inference. Please check OpenVINO official Documentation for the installation.

Step 1: Training 💪

Simply run the code in the Part 1 - Training.ipynb notebook and you are good to go.

Here is the training process if you would want to learn more:

Dataset Preparation
You should structure your data in these format:

data
    train
        speaker_1
            audio_1.wav
            audio_2.wav
        speaker_2
            audio_1.wav
            audio_2.wav            
    valid
        speaker_1
            audio_1.wav
            audio_2.wav
        speaker_2
            audio_1.wav
            audio_2.wav
    test
        both_speaker_speaking.wav

You can use this sample dataset in Kaggle to help you understand how to structure your own data.

Dataset & Dataloader
I have created AudioFolder PyTorch Dataset Object so automatically utilize the dataset. Here is a quick look for you
```
AudioFolder(folder_path, n_data=500, slice_dur=5, sr=16000, n_mel=128, n_fft=2048, win_length=2048, hop_length=1024, freq_range=(0, 8000)):
```
In a nutshell, here is what it does:
- Randomly choose an audio from folder_path, then sampled it using sr rate
- Then it randomly slice a chunk of the audio for slice_dur seconds, quantile normalize, then converted to log melspectrogram using specified parameter (n_mel, n_fft, and so on)
- Zero crossing rate is used as VAD to label the spectrogram frames. Each frame is labeled as the subfolder name or <NONE> if it is silent.
- AudioFolder does these steps n_data times to create the dataset
Architecture & Config
Simply create the model architecture you would like to use. I have made one sample for you in src/model.py. You can directly use it by simple changing the hyperparameters in the config. Please follow the notebook if you are confused.

❗ Note: If you would like to use custom architecture in OpenVINO, make sure it is compatible with OpenVINO model optimizer and inference engine.
Training Preparation
Set up the model, criterion, optimizer, and callback here.
Training
As what it says, running the code will train the model

Step 2: Inference 🤖

I have prepared a PyTorchPredictor and OpenVINOPredictor object. Here is a sneak peek:

PyTorchPredictor(weights_path, config_path, chunk_dur)
OpenVINOPredictor(model_xml, model_bin, config_path, chunk_dur, CPU_EXT)

To use it, simply input the arguments and use .predict(wav_path) and it will return the diarization timestamp. For example:

p = PyTorchPredictor("model/weights_best.pth", "model/configs.pth", chunk_dur=2)
timestamp = p.predict("dataset/test/test.wav", plot=True)

setting plot=True provides you with the diarization visualization like this

If you would like to use OpenVINO, use .to_onnx(fname, outdir) to convert the model into onnx format.

p.to_onnx("speaker_diarization.onnx", "model")

End notes ❤️

I hope you like the projects. I made this repo for educational purposes so it might need further tweaking to reach production level. Feel free to create an issue if you need help, and I hope I'll have the time to help you. Thank you.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Part 1 - Training.ipynb		Part 1 - Training.ipynb
Part 2 - Testing.ipynb		Part 2 - Testing.ipynb
Part 3 - Testing with OpenVINO.ipynb		Part 3 - Testing with OpenVINO.ipynb
Part 4 - Benchmark.ipynb		Part 4 - Benchmark.ipynb
README.md		README.md
environment.yml		environment.yml
pretrained_model.zip		pretrained_model.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Diarization using GRU in PyTorch

Installation

Conda environment

Install OpenVINO (Optional)

Step 1: Training 💪

Step 2: Inference 🤖

End notes ❤️

About

Releases

Packages

Languages

License

WiraDKP/pytorch_gru_speaker_diarization

Folders and files

Latest commit

History

Repository files navigation

Speaker Diarization using GRU in PyTorch

Installation

Conda environment

Install OpenVINO (Optional)

Step 1: Training 💪

Step 2: Inference 🤖

End notes ❤️

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages