Increasing embedding batch size in pyannote/speaker-diarization-3.0 #1486

landEpita · 2023-10-04T15:38:20Z

Hello,
I noticed that in version 3.0, the batch size for embeddings is set to 1. Is it possible to increase this batch size to speed up the inference? What other measures can be taken to accelerate the inference process? Thank you.

github-actions · 2023-10-04T15:38:39Z

Thank you for your issue.
We found the following entries in the FAQ which you may find helpful:

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

hbredin · 2023-10-04T18:30:00Z

The way pyannote/speaker-diarization-3.0 uses hbredin/wespeaker-voxceleb-resnet34-LM speaker embedding does not allow batching this part of the inference for now. So, setting embedding_batch_size to a higher value will not speed things up...

... unless you switch to pyannote/embedding or speechbrain/spkrec-ecapa-voxceleb

... in which case you would get slightly lower accuracy.

landEpita · 2023-10-05T07:30:50Z

Thank you for your response. Do you think we can deploy the model with TensorRT? I also noticed in the code that we can use Nvidia's embedding with Nemo. What are your thoughts on Nvidia's model?

hbredin · 2023-10-05T08:08:49Z

Do you think we can deploy the model with TensorRT?

I see no reason why you could not. I'd love to hear back from you when/if you try!

I also noticed in the code that we can use Nvidia's embedding with Nemo.
What are your thoughts on Nvidia's model?

I only tested NeMo speaker embedding models in the past, and have not played with its diarization capabilities. My conclusion at the time was that NeMo TitaNet was almost on par with speechbrain's embedding. I did not try NeMo recently so there might be a better speaker embedding now...

hbredin · 2023-11-16T19:52:15Z

Closing as it reads like the original questions has been answered.

That being said, I recommend switching to pyannote.audio 3.1 and pyannote/speaker-diarization-3.1 that do support increasing embedding_batch_size.

v-nhandt21 · 2024-06-14T06:18:01Z

Dear @hbredin , how can I inference audios in batching parallel in pyannote?

'''model.inference({"waveform": audio, "sample_rate": self.sample_rate})'''

And for confirmation, can pyannote support for embedding_batch_size in 3.1?

hbredin changed the title ~~Pyannote diarization 3.0~~ Increasing embedding batch size in pyannote/speaker-diarization-3.0 Oct 4, 2023

hbredin closed this as completed Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing embedding batch size in pyannote/speaker-diarization-3.0 #1486

Increasing embedding batch size in pyannote/speaker-diarization-3.0 #1486

landEpita commented Oct 4, 2023

github-actions bot commented Oct 4, 2023

hbredin commented Oct 4, 2023

landEpita commented Oct 5, 2023

hbredin commented Oct 5, 2023

hbredin commented Nov 16, 2023

v-nhandt21 commented Jun 14, 2024

Increasing embedding batch size in pyannote/speaker-diarization-3.0 #1486

Increasing embedding batch size in pyannote/speaker-diarization-3.0 #1486

Comments

landEpita commented Oct 4, 2023

github-actions bot commented Oct 4, 2023

hbredin commented Oct 4, 2023

landEpita commented Oct 5, 2023

hbredin commented Oct 5, 2023

hbredin commented Nov 16, 2023

v-nhandt21 commented Jun 14, 2024