You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for the amazing demo and doc! I have a question regarding this section in zero-inference. It is mentioned that "Thus, our current implementation computes attention scores on CPU". May I ask if there is a detailed comparison of the latency or throughput between GPU-attention and CPU-attention to support this desicion? I am also serious about the implementation detail of the CPU-attention computation. Thank you!
The text was updated successfully, but these errors were encountered:
Hi, thank you for the amazing demo and doc! I have a question regarding this section in zero-inference. It is mentioned that
"Thus, our current implementation computes attention scores on CPU"
. May I ask if there is a detailed comparison of the latency or throughput between GPU-attention and CPU-attention to support this desicion? I am also serious about the implementation detail of the CPU-attention computation. Thank you!The text was updated successfully, but these errors were encountered: