Merge branch 'develop' into feat/option-to-hide-progress-bar

pyannote · Oct 8, 2024 · 13160e1 · 13160e1
2 parents 8f7d342 + 5832510
commit 13160e1
Show file tree

Hide file tree

Showing 5 changed files with 3,612 additions and 7 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,8 +2,17 @@
 
 ## develop
 
-- fix: fix support for `numpy==2.x` ([@metal3d](https://github.com/metal3d/))
+### Fixes
+
+- fix: fix clipping issue in speech separation pipeline ([@joonaskalda](https://github.com/joonaskalda/))
+
+
+## Version 3.3.2 (2024-09-11)
+
+### Fixes
 
+- fix: (really) fix support for `numpy==2.x` ([@metal3d](https://github.com/metal3d/))
+- doc: fix `Pipeline` docstring ([@huisman](https://github.com/huisman/))
 
 ## Version 3.3.1 (2024-06-19)
 

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-Using `pyannote.audio` open-source toolkit in production?  
+Using `pyannote.audio` open-source toolkit in production?
 Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
 
 # `pyannote.audio` speaker diarization toolkit
@@ -73,6 +73,7 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
   - [First release of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min
 - Community contributions (not maintained by the core team)
   - 2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun)
+  - 2024-09-24 > [Evaluating `pyannote` pretrained speech separation pipelines](tutorials/community/eval_separation_pipeline.ipynb) by  [Clément Pagés](https://github.com/)
 
 ## Benchmark
 

diff --git a/pyannote/audio/pipelines/speech_separation.py b/pyannote/audio/pipelines/speech_separation.py
@@ -441,7 +441,6 @@ def reconstruct(
             clustered_segmentations, segmentations.sliding_window
         )
         return clustered_segmentations
-        return self.to_diarization(clustered_segmentations, count)
 
     def apply(
         self,
@@ -644,16 +643,22 @@ def apply(
                         len(speaker_activation), dtype=float
                     )
 
-                    speaker_activation_with_context[np.concatenate(remaining_zeros)] = (
-                        0.0
-                    )
+                    speaker_activation_with_context[
+                        np.concatenate(remaining_zeros)
+                    ] = 0.0
 
                     discrete_diarization.data.T[i] = speaker_activation_with_context
             num_sources = sources.data.shape[1]
             sources.data = (
                 sources.data * discrete_diarization.align(sources).data[:, :num_sources]
             )
 
+        # separated sources might be scaled up/down due to SI-SDR loss used when training
+        # so we peak-normalize them
+        sources.data = sources.data / np.max(
+            np.abs(sources.data), axis=0, keepdims=True
+        )
+
         # convert to continuous diarization
         diarization = self.to_annotation(
             discrete_diarization,