Shared speaker embeddings across batch of audio streams. #1502

FredHaa · 2023-10-16T09:30:50Z

Is your feature request related to a problem? Please describe.
I have a use case where i need to diarize a file containing multiple tracks, where each track corresponds to a separate microphone which all record concurrently. If I run each track individually through pyannote, the speaker IDs I cannot trust that speaker IDs correlates to the same person.

My current solution is to concatenate tracks, run them through pyannote as a single stream, and then fixing the speaker timings afterwards.

Describe the solution you'd like
The possibility to input a batch of streams. Each stream would be segmented individually, and then clustered using the same embedding space.

github-actions · 2023-10-16T09:31:11Z

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

hbredin · 2023-10-17T17:42:40Z

I have a use case where i need to diarize a file containing multiple tracks, where each track corresponds to a separate microphone which all record concurrently.

Can you please clarify your use case?
Are those microphones in the same room? recording the same conversation? Are they synchronized?

FredHaa · 2023-10-17T18:48:52Z

The microphones are attached to people who move around, so they have overlapped speech in periods, and separate in others.

The microphones are synchronized.

hbredin · 2023-10-24T06:17:42Z

Sorry for the delay.

I think we should sit together and talk for me to really understand and help you with your use case.
I am available for contracting if this is something you'd consider

FredHaa · 2023-10-24T07:24:55Z

Well, my hack works fine, so not much sense in paying for consulting there. This feature request was mostly about that the interface would be nicer if you could input a batch, and I believe it would be beneficial in a vast number of use cases, not just my own.

However, I am interested in hearing about the premium models, so I'll send an email.

stale · 2024-04-22T03:24:39Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the wontfix label Apr 22, 2024

stale bot closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared speaker embeddings across batch of audio streams. #1502

Shared speaker embeddings across batch of audio streams. #1502

FredHaa commented Oct 16, 2023

github-actions bot commented Oct 16, 2023

hbredin commented Oct 17, 2023

FredHaa commented Oct 17, 2023

hbredin commented Oct 24, 2023

FredHaa commented Oct 24, 2023

stale bot commented Apr 22, 2024

Shared speaker embeddings across batch of audio streams. #1502

Shared speaker embeddings across batch of audio streams. #1502

Comments

FredHaa commented Oct 16, 2023

github-actions bot commented Oct 16, 2023

hbredin commented Oct 17, 2023

FredHaa commented Oct 17, 2023

hbredin commented Oct 24, 2023

FredHaa commented Oct 24, 2023

stale bot commented Apr 22, 2024