How to know how much real GPU memory is used? #3056
-
I know vllm comes up with a controlled area to store KV cached, but a lot of it is actually not really used. May I know that is there any way to measure real GPU memory usage? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
If you are using Nvidia, execute |
Beta Was this translation helpful? Give feedback.
-
vLLM records cache usage, logs them and exposes them via prometheus. We can also recalculate the GPU memory usage from GPU block numbers and it usage. But here are not direct GPU memory usage by kvcache in vLLM for now.
The latest code also logs how many GPU memory used while loading model weights.
You can also estimate the GPU memory usage by |
Beta Was this translation helpful? Give feedback.
vLLM records cache usage, logs them and exposes them via prometheus. We can also recalculate the GPU memory usage from GPU block numbers and it usage. But here are not direct GPU memory usage by kvcache in vLLM for now.