Speed / Latency #60

excellent-ai · 2023-11-08T00:57:37Z

excellent-ai
Nov 8, 2023

Could we share the inference speed (tokens/second) and latency (seconds) for some GPU models here for the Yi-6B-200K model?

Any new data points or insights are welcome in this discussion.

KerfuffleV2 · 2023-11-08T20:45:25Z

(Not an official response.) It has the same architecture as LLaMA2 so the speed should be the same as equivalent size long-context LLaMA2 models.

0 replies