Skip to content

Commit

Permalink
FP8 KV + Disagg unit test (#3218)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #3218

X-link: facebookresearch/FBGEMM#315

Adding the fp8 kv cache to disagg test for mp2.
Changes include changing the model to 7b llama model. The small model has D_H of 64, which is not working with dequantization kernel (will check the issue in another diff).

TODO:
add  Fp8 kv cache + paged kv to the test

Reviewed By: jianyuh

Differential Revision: D62772678

fbshipit-source-id: 775f572e2c345354844e24d80e2481284ac6f1a3
  • Loading branch information
ayaIbrah authored and facebook-github-bot committed Oct 3, 2024
1 parent 788cd2a commit f4710c1
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu
Original file line number Diff line number Diff line change
Expand Up @@ -1437,6 +1437,7 @@ __global__ void dequantize_fp8_cache_kernel(
auto MAX_T = cache_K.size(1);
auto D_H = cache_K_dq.size(3);
auto D_H_q = cache_K.size(3);
// TODO: support D_H < 128 for small model used in testing.
CUDA_KERNEL_ASSERT(D_H == 128);

auto b = blockIdx.x;
Expand Down

0 comments on commit f4710c1

Please sign in to comment.