-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SpecInfer] Update RequestManager #1096
Conversation
It seems we still have some issues with longer prompt sets in the spec_beam_attention kernel. I am getting an CUDA error with these prompt set:
|
The issue
The issue can be solved by applying a similar change as we discussed above. |
… into update_rm_backup
… into update_rm_backup
Description of changes:
Related Issues:
Linked Issues:
Issues closed by this PR: