Repository for paper Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers
https://arxiv.org/abs/2010.11803
Install pytorch using instructions from https://pytorch.org.
Install pyannote-audio using instructions from https://github.com/pyannote/pyannote-audio.
Download AMI Headset-mix dataset using the script from https://github.com/pyannote/pyannote-audio/tree/master/tutorials/data_preparation.
Run the command
python diarization_pipeline.py [YOUR_AMI_DATA_PATH]
will generate rttm formated diarization results for all experiments.
Use command
python isat_diarization.py [WAV_PATH] [OUTPUT_DIR]
to generate both rttm formated VAD and diarization results for a 16k Hz wav file.