You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi!
i just tested your way but i got worse response time
i'm leaving this issue because there would be somthing wrong in my code or logic
or i'm using this tools inappropriate
environment
Ubuntu 18.04
T4
torch == 1.11.0+cu113
optimum == 1.4.0
onnx == 1.12.0
Python 3.8.10
triton 22.01
i ported stabilityai/stable-diffusion-2-1-base with convert_stable_diffusion_checkpoint_to_onnx.py and used your model directory with fixing some pbtxt dimensions
and add noise_pred = noise_pred.to("cuda") this line at link
and triton server worked like below
then i inference with this prompts
prompts = [
"A man standing with a red umbrella",
"A child standing with a green umbrella",
"A woman standing with a yellow umbrella"
]
and i get response after 6.8 sec (avg of 3 inferences)
strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec.
of course it was done at same environment and it's also served from triton inference server
(but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)
is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline?
i expected more performance as it's hard to serve..
The text was updated successfully, but these errors were encountered:
hi!
i just tested your way but i got worse response time
i'm leaving this issue because there would be somthing wrong in my code or logic
or i'm using this tools inappropriate
environment
i ported
stabilityai/stable-diffusion-2-1-base
withconvert_stable_diffusion_checkpoint_to_onnx.py
and used your model directory with fixing some pbtxt dimensionsand add
noise_pred = noise_pred.to("cuda")
this line at linkand triton server worked like below
then i inference with this prompts
and i get response after 6.8 sec (avg of 3 inferences)
strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec.
of course it was done at same environment and it's also served from triton inference server
(but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)
is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline?
i expected more performance as it's hard to serve..
The text was updated successfully, but these errors were encountered: