Speed / Latency #60
excellent-ai
started this conversation in
General
Replies: 1 comment
-
(Not an official response.) It has the same architecture as LLaMA2 so the speed should be the same as equivalent size long-context LLaMA2 models. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Could we share the inference speed (tokens/second) and latency (seconds) for some GPU models here for the
Yi-6B-200K
model?Any new data points or insights are welcome in this discussion.
Beta Was this translation helpful? Give feedback.
All reactions