Skip to content

Commit

Permalink
Disable kvcache api for now, waiting CK to support correct layout for…
Browse files Browse the repository at this point in the history
… append kv
  • Loading branch information
rocking5566 committed Dec 13, 2024
1 parent 84c153f commit 3c655c5
Show file tree
Hide file tree
Showing 2 changed files with 267 additions and 265 deletions.
2 changes: 2 additions & 0 deletions csrc/flash_attn_ck/mha_fwd_kvcache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,8 @@ mha_fwd_kvcache(at::Tensor &q, // batch_siz
bool is_rotary_interleaved, // if true, rotary combines indices 0 & 1, else indices 0 & rotary_dim / 2
int num_splits)
{
TORCH_CHECK(false, "vllm layout does not support mha_fwd_kvcache for now");

auto q_dtype = q.dtype();
TORCH_CHECK(q_dtype == torch::kFloat16 || q_dtype == torch::kBFloat16,
"FlashAttention only support fp16 and bf16 data type");
Expand Down
Loading

0 comments on commit 3c655c5

Please sign in to comment.