Improve decode latency #1393

scse-l · 2024-01-03T06:54:02Z

scse-l
Jan 3, 2024

I've done some profiling and find that decoding generated tokens one by one costs a lot (20% ~ 30% of total costs). Is there some work we can do to optimize this ?
PS: I've tried to decode generated tokens parallelly by process pool. But it works worse due to high overhead of cross-process conmunications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve decode latency #1393

{{title}}

Replies: 0 comments

Select a reply

Improve decode latency #1393

scse-l Jan 3, 2024

Replies: 0 comments

scse-l
Jan 3, 2024