-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not set sliding_window if SUPPORTS_WINDOWING is false #2554
Conversation
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
for model like mistralai/Mistral-7B-v0.1, whose "sliding_window" is not null. During handling of the above exception, another exception occurred: even sliding window size is not meet. |
there's logic in init.py, see https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/models/__init__.py#L509-L512, so no need to set sliding window if SUPPORTS_WINDOWING is false |
I cannot reproduce the issue. The error is raised if windowed attention is necessary AND the max-total-tokens > window size. I tried on a target without attention and get the correct behavior. |
Hi, @Narsil you could use tag 2.3.0 to reproduce it. I could not reproduce the issue in latest tag either because mllama enabing makes the page attention path not work any more in intel platform. error like this is launcher command to reproduce by my side the error occur when max-total-tokens < window size. if max-total > window size, the error should be raised since window size is not supported yet in ipex page attention kernel. |
import flash_attn_2_cuda issue fixed by #2610 |
5187637
to
735bcf6
Compare
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
could reproduce in latest main since mllama PR is merged. could you revisit the updated PR? thanks @Narsil |
For the windowing, I reproduced and fixed the bug I think slightly more simply : #2637 Does that work ? The logic was already there, the bug was that |
ENV TORCH_LLM_ALLREDUCE=1 | ||
ENV CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing ?
Should this be included in a different PR ?
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.