Skip to content

NSF-iSAT/Compositional-embedding-for-speaker-diarization

 
 

Repository files navigation

Repository for paper Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers
https://arxiv.org/abs/2010.11803

Requirement

Install pytorch using instructions from https://pytorch.org.
Install pyannote-audio using instructions from https://github.com/pyannote/pyannote-audio.

Evaluation

Prepare test data

Download AMI Headset-mix dataset using the script from https://github.com/pyannote/pyannote-audio/tree/master/tutorials/data_preparation.

Run test script for all experiments in the paper

Run the command
python diarization_pipeline.py [YOUR_AMI_DATA_PATH]
will generate rttm formated diarization results for all experiments.

Diarization for a single wav file (for ISAT)

Use command
python isat_diarization.py [WAV_PATH] [OUTPUT_DIR]
to generate both rttm formated VAD and diarization results for a 16k Hz wav file.

Training

Prepare trainin data

Train the model

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%