Batched speculation benchmarks? (incl. with compilation) #126

poedator · 2024-08-28T08:42:46Z

Hello, and thank you for pushing the boundary on speculative generation!

Question 1: In Eagle-1 paper, table 7 reports throughput figures for Vicuna-7B at 1.97x. How exactly was this measured? (GPU, batch size, testing code). How would this be different if generating with temperature==1 ?

Question 2: Is Eagle-2 compatible with batch generation? Specifically including tree attention and temperature == 1?
If so, please share the code example like in Eagle-1 and the benchmarks like in Eagle-1 paper section4.4, table 7.

Question 3: Were there any tests of Eagle/Eagle2 in setups with compilation or some faster frameworks, or with TensorRT, how would speedup figures change vs. the ones reported in the papers? Especially in the batched setup

Liyuhui-12 · 2024-08-28T09:38:35Z

Question 1: In Eagle-1 paper, table 7 reports throughput figures for Vicuna-7B at 1.97x. How exactly was this measured? (GPU, batch size, testing code). How would this be different if generating with temperature==1 ?

The specific settings can be found in Section 4.4 of our paper, and the code is on the v1 branch. As with other speculative sampling methods, the performance of temperature=1 will be slightly worse than temperature=0.

Question 2: Is Eagle-2 compatible with batch generation? Specifically including tree attention and temperature == 1? If so, please share the code example like in Eagle-1 and the benchmarks like in Eagle-1 paper section4.4, table 7.

EAGLE-2 currently does not support batch generation.

Question 3: Were there any tests of Eagle/Eagle2 in setups with compilation or some faster frameworks, or with TensorRT, how would speedup figures change vs. the ones reported in the papers? Especially in the batched setup

Integration with other frameworks is a significant amount of work, and it is part of our future plans.

Siegfried-qgf · 2024-09-02T02:27:11Z

Hello，I wanna ask if EAGLE2 method can generate in batches. Is this method itself not suitable for batch generation？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batched speculation benchmarks? (incl. with compilation) #126

Batched speculation benchmarks? (incl. with compilation) #126

poedator commented Aug 28, 2024

Liyuhui-12 commented Aug 28, 2024

Siegfried-qgf commented Sep 2, 2024

Batched speculation benchmarks? (incl. with compilation) #126

Batched speculation benchmarks? (incl. with compilation) #126

Comments

poedator commented Aug 28, 2024

Liyuhui-12 commented Aug 28, 2024

Siegfried-qgf commented Sep 2, 2024