Skip to content

Commit

Permalink
Merge branch 'develop' into feat/option-to-hide-progress-bar
Browse files Browse the repository at this point in the history
  • Loading branch information
hbredin authored Oct 8, 2024
2 parents 8f7d342 + 5832510 commit 13160e1
Show file tree
Hide file tree
Showing 5 changed files with 3,612 additions and 7 deletions.
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,17 @@

## develop

- fix: fix support for `numpy==2.x` ([@metal3d](https://github.com/metal3d/))
### Fixes

- fix: fix clipping issue in speech separation pipeline ([@joonaskalda](https://github.com/joonaskalda/))


## Version 3.3.2 (2024-09-11)

### Fixes

- fix: (really) fix support for `numpy==2.x` ([@metal3d](https://github.com/metal3d/))
- doc: fix `Pipeline` docstring ([@huisman](https://github.com/huisman/))

## Version 3.3.1 (2024-06-19)

Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Using `pyannote.audio` open-source toolkit in production?
Using `pyannote.audio` open-source toolkit in production?
Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.

# `pyannote.audio` speaker diarization toolkit
Expand Down Expand Up @@ -73,6 +73,7 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
- [First release of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
- Community contributions (not maintained by the core team)
- 2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun)
- 2024-09-24 > [Evaluating `pyannote` pretrained speech separation pipelines](tutorials/community/eval_separation_pipeline.ipynb) by [Clément Pagés](https://github.com/)

## Benchmark

Expand Down
13 changes: 9 additions & 4 deletions pyannote/audio/pipelines/speech_separation.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,7 +441,6 @@ def reconstruct(
clustered_segmentations, segmentations.sliding_window
)
return clustered_segmentations
return self.to_diarization(clustered_segmentations, count)

def apply(
self,
Expand Down Expand Up @@ -644,16 +643,22 @@ def apply(
len(speaker_activation), dtype=float
)

speaker_activation_with_context[np.concatenate(remaining_zeros)] = (
0.0
)
speaker_activation_with_context[
np.concatenate(remaining_zeros)
] = 0.0

discrete_diarization.data.T[i] = speaker_activation_with_context
num_sources = sources.data.shape[1]
sources.data = (
sources.data * discrete_diarization.align(sources).data[:, :num_sources]
)

# separated sources might be scaled up/down due to SI-SDR loss used when training
# so we peak-normalize them
sources.data = sources.data / np.max(
np.abs(sources.data), axis=0, keepdims=True
)

# convert to continuous diarization
diarization = self.to_annotation(
discrete_diarization,
Expand Down
Loading

0 comments on commit 13160e1

Please sign in to comment.