You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, and thank you for pushing the boundary on speculative generation!
Question 1: In Eagle-1 paper, table 7 reports throughput figures for Vicuna-7B at 1.97x. How exactly was this measured? (GPU, batch size, testing code). How would this be different if generating with temperature==1 ?
Question 2: Is Eagle-2 compatible with batch generation? Specifically including tree attention and temperature == 1?
If so, please share the code example like in Eagle-1 and the benchmarks like in Eagle-1 paper section4.4, table 7.
Question 3: Were there any tests of Eagle/Eagle2 in setups with compilation or some faster frameworks, or with TensorRT, how would speedup figures change vs. the ones reported in the papers? Especially in the batched setup
The text was updated successfully, but these errors were encountered:
Question 1: In Eagle-1 paper, table 7 reports throughput figures for Vicuna-7B at 1.97x. How exactly was this measured? (GPU, batch size, testing code). How would this be different if generating with temperature==1 ?
The specific settings can be found in Section 4.4 of our paper, and the code is on the v1 branch. As with other speculative sampling methods, the performance of temperature=1 will be slightly worse than temperature=0.
Question 2: Is Eagle-2 compatible with batch generation? Specifically including tree attention and temperature == 1? If so, please share the code example like in Eagle-1 and the benchmarks like in Eagle-1 paper section4.4, table 7.
EAGLE-2 currently does not support batch generation.
Question 3: Were there any tests of Eagle/Eagle2 in setups with compilation or some faster frameworks, or with TensorRT, how would speedup figures change vs. the ones reported in the papers? Especially in the batched setup
Integration with other frameworks is a significant amount of work, and it is part of our future plans.
Hello, and thank you for pushing the boundary on speculative generation!
Question 1: In Eagle-1 paper, table 7 reports throughput figures for Vicuna-7B at 1.97x. How exactly was this measured? (GPU, batch size, testing code). How would this be different if generating with temperature==1 ?
Question 2: Is Eagle-2 compatible with batch generation? Specifically including tree attention and temperature == 1?
If so, please share the code example like in Eagle-1 and the benchmarks like in Eagle-1 paper section4.4, table 7.
Question 3: Were there any tests of Eagle/Eagle2 in setups with compilation or some faster frameworks, or with TensorRT, how would speedup figures change vs. the ones reported in the papers? Especially in the batched setup
The text was updated successfully, but these errors were encountered: