-
-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Memory Leak when Concurrent Process #1510
Comments
Thank you for your issue.You might want to check the FAQ if you haven't done so already. Feel free to close this issue if you found an answer in the FAQ. If your issue is a feature request, please read this first and update your request accordingly, if needed. If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users). Companies relying on
|
I am afraid I won't be able to help you as I have very litte experience with actual deployment and even less concurrent processing. 📣 To people that have successfully deployed pyannote pipelines in production (and version 3.0 in particular), now would be the right time to chime in and help @zanjabil2502 |
Thanks for your response. Maybe I have some tips, for reduce GPU Usage you can change batch size of self._segmentation to 16 or 8, it will reduce GPU Usage until under 1 Gb (700 Mb to 800 Mb) and realtime factor still on 0.025. I add this code on speaker_verification.py (def sess_options.enable_mem_pattern = False
sess_options.enable_cpu_mem_arena = False
sess_options.enable_mem_reuse = False
self.session_ = ort.InferenceSession(self.embedding, sess_options=sess_options, providers=providers)
self.session_.disable_fallback() From some forum, this code will anticipate memory leak on CPU when use ONNX-Runtime. And, this from my opinion, when i use onnxruntime-gpu <= 1.12.1, GPU usage will be smaller than use onnxruntime-gpu==1.16.1 |
FYI: #1537 |
Closing as latest version no longer relies on ONNX runtime. |
Firstly, i say thank you for updating pyannote-audio, specially for model v3.0. But i want report a something what i found. I know if the model of speaker embedding is new model and you choice train the model on onnx-runtime to make it lighter than use torch model. But there a bug in onnx-runtime, specially when using GPU. So when i using GPU and i run concurrent process, GPU cant decrease when the model on idle time until i stop my program.
I read some forum from onnx-runtime repo and many user have same problem in onnx-runtime. May you have solution for this problem?
The text was updated successfully, but these errors were encountered: