Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Memory Leak when Concurrent Process #1510

Closed
zanjabil2502 opened this issue Oct 21, 2023 · 5 comments
Closed

GPU Memory Leak when Concurrent Process #1510

zanjabil2502 opened this issue Oct 21, 2023 · 5 comments

Comments

@zanjabil2502
Copy link

zanjabil2502 commented Oct 21, 2023

Firstly, i say thank you for updating pyannote-audio, specially for model v3.0. But i want report a something what i found. I know if the model of speaker embedding is new model and you choice train the model on onnx-runtime to make it lighter than use torch model. But there a bug in onnx-runtime, specially when using GPU. So when i using GPU and i run concurrent process, GPU cant decrease when the model on idle time until i stop my program.

I read some forum from onnx-runtime repo and many user have same problem in onnx-runtime. May you have solution for this problem?

@github-actions
Copy link

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

  • installation
  • data preparation
  • model download
  • etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

  • paid scientific consulting around speaker diarization and speech processing in general;
  • custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

@hbredin
Copy link
Member

hbredin commented Oct 22, 2023

I am afraid I won't be able to help you as I have very litte experience with actual deployment and even less concurrent processing.

📣 To people that have successfully deployed pyannote pipelines in production (and version 3.0 in particular), now would be the right time to chime in and help @zanjabil2502

@zanjabil2502
Copy link
Author

zanjabil2502 commented Oct 22, 2023

Thanks for your response. Maybe I have some tips, for reduce GPU Usage you can change batch size of self._segmentation to 16 or 8, it will reduce GPU Usage until under 1 Gb (700 Mb to 800 Mb) and realtime factor still on 0.025.

I add this code on speaker_verification.py (def WeSpeakerPretrainedSpeakerEmbedding)

sess_options.enable_mem_pattern     = False
sess_options.enable_cpu_mem_arena   = False
sess_options.enable_mem_reuse       = False
self.session_ = ort.InferenceSession(self.embedding, sess_options=sess_options, providers=providers)
self.session_.disable_fallback()

From some forum, this code will anticipate memory leak on CPU when use ONNX-Runtime.

And, this from my opinion, when i use onnxruntime-gpu <= 1.12.1, GPU usage will be smaller than use onnxruntime-gpu==1.16.1

@hbredin
Copy link
Member

hbredin commented Nov 9, 2023

FYI: #1537

@hbredin
Copy link
Member

hbredin commented Nov 16, 2023

Closing as latest version no longer relies on ONNX runtime.
Please update to pyannote.audio 3.1 and pyannote/speaker-diarization-3.1 (and open new issues if needed).

@hbredin hbredin closed this as completed Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants