Question to attention computation #944

yuzhenmao · 2024-12-15T06:42:20Z

Hi, thank you for the amazing demo and doc! I have a question regarding this section in zero-inference. It is mentioned that "Thus, our current implementation computes attention scores on CPU". May I ask if there is a detailed comparison of the latency or throughput between GPU-attention and CPU-attention to support this desicion? I am also serious about the implementation detail of the CPU-attention computation. Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question to attention computation #944

Question to attention computation #944

yuzhenmao commented Dec 15, 2024 •

edited

Loading

Question to attention computation #944

Question to attention computation #944

Comments

yuzhenmao commented Dec 15, 2024 • edited Loading

yuzhenmao commented Dec 15, 2024 •

edited

Loading