Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched speculation benchmarks? (incl. with compilation) #126

Open
poedator opened this issue Aug 28, 2024 · 2 comments
Open

Batched speculation benchmarks? (incl. with compilation) #126

poedator opened this issue Aug 28, 2024 · 2 comments

Comments

@poedator
Copy link

Hello, and thank you for pushing the boundary on speculative generation!

Question 1: In Eagle-1 paper, table 7 reports throughput figures for Vicuna-7B at 1.97x. How exactly was this measured? (GPU, batch size, testing code). How would this be different if generating with temperature==1 ?

Question 2: Is Eagle-2 compatible with batch generation? Specifically including tree attention and temperature == 1?
If so, please share the code example like in Eagle-1 and the benchmarks like in Eagle-1 paper section4.4, table 7.

Question 3: Were there any tests of Eagle/Eagle2 in setups with compilation or some faster frameworks, or with TensorRT, how would speedup figures change vs. the ones reported in the papers? Especially in the batched setup

@Liyuhui-12
Copy link
Collaborator

Question 1: In Eagle-1 paper, table 7 reports throughput figures for Vicuna-7B at 1.97x. How exactly was this measured? (GPU, batch size, testing code). How would this be different if generating with temperature==1 ?

The specific settings can be found in Section 4.4 of our paper, and the code is on the v1 branch. As with other speculative sampling methods, the performance of temperature=1 will be slightly worse than temperature=0.

Question 2: Is Eagle-2 compatible with batch generation? Specifically including tree attention and temperature == 1? If so, please share the code example like in Eagle-1 and the benchmarks like in Eagle-1 paper section4.4, table 7.

EAGLE-2 currently does not support batch generation.

Question 3: Were there any tests of Eagle/Eagle2 in setups with compilation or some faster frameworks, or with TensorRT, how would speedup figures change vs. the ones reported in the papers? Especially in the batched setup

Integration with other frameworks is a significant amount of work, and it is part of our future plans.

@Siegfried-qgf
Copy link

Hello,I wanna ask if EAGLE2 method can generate in batches. Is this method itself not suitable for batch generation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants