[SpecInfer] Update RequestManager #1096

zwang86 · 2023-09-05T04:26:03Z

Description of changes:

Related Issues:

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

include/flexflow/batch_config.h

zwang86 · 2023-09-11T03:07:36Z

It seems we still have some issues with longer prompt sets in the spec_beam_attention kernel. I am getting an CUDA error with these prompt set:

["Write a detailed product description for a food chopper tool that lets you chop fruits and vegetables.",
    "Write a short blog post (500 words) about the best dog toys for new dog owners.",
    "ChatGPT is rewriting Genesis.",
    "Please write the evolution of humans by natural selection in the form of a recipe."]

 ** On entry to GEMM_EX  parameter number 18 had an illegal value
 ** On entry to GEMM_EX  parameter number 18 had an illegal value
Cuda failure: 7
/home/zeyuwang/FlexFlow/src/ops/spec_inc_multihead_self_attention.cu:316
Aborting...
spec_infer: /home/zeyuwang/FlexFlow/src/ops/spec_inc_multihead_self_attention.cu:316: void FlexFlow::Kernels::SpecIncMultiHeadAttention::compute_attention_kernel(const FlexFlow::SpecIncMultiHeadSelfAttentionMeta*, const FlexFlow::BeamSearchBatchConfig*, int, DT*, const DT*, const DT*, cudaStream_t) [with DT = __half; cudaStream_t = CUstream_st*]: Assertion `false' failed.
Aborted (core dumped)

zwang86 · 2023-09-11T03:23:06Z

The issue

It seems we still have some issues with longer prompt sets in the spec_beam_attention kernel. I am getting an CUDA error with these prompt set:

["Write a detailed product description for a food chopper tool that lets you chop fruits and vegetables.",
    "Write a short blog post (500 words) about the best dog toys for new dog owners.",
    "ChatGPT is rewriting Genesis.",
    "Please write the evolution of humans by natural selection in the form of a recipe."]

 ** On entry to GEMM_EX  parameter number 18 had an illegal value
 ** On entry to GEMM_EX  parameter number 18 had an illegal value
Cuda failure: 7
/home/zeyuwang/FlexFlow/src/ops/spec_inc_multihead_self_attention.cu:316
Aborting...
spec_infer: /home/zeyuwang/FlexFlow/src/ops/spec_inc_multihead_self_attention.cu:316: void FlexFlow::Kernels::SpecIncMultiHeadAttention::compute_attention_kernel(const FlexFlow::SpecIncMultiHeadSelfAttentionMeta*, const FlexFlow::BeamSearchBatchConfig*, int, DT*, const DT*, const DT*, cudaStream_t) [with DT = __half; cudaStream_t = CUstream_st*]: Assertion `false' failed.
Aborted (core dumped)

The issue can be solved by applying a similar change as we discussed above.
Due to skipping attention computation for the pending request, this assertion no longer holds, but the out puts seems reasonable while commenting out the assertion. @xinhaoc Do you think we need to make any modifications in the attention kernel?
Note: although the code is located in spec_inc_multihead_self_attention, the func is actually called in verify_inc_multihead_self_attention.

… into update_rm_backup

Zeyu Wang and others added 11 commits August 18, 2023 00:20

Reorder pipeline.

6451a3b

Merge branch 'inference' into update_rm

6613f90

refactor and small fixes.

1de3e21

Merge branch 'inference' into update_rm

46344ed

Merge branch 'inference' into update_rm

a15d814

Update

c37235f

Merge branch 'inference' into update_rm

3c93dbf

Refactor backup.

d18926f

pipeline update.

99bb696

Merge branch 'inference' into update_rm_backup

83ae640

Format.

e6f2474

zwang86 marked this pull request as ready for review September 5, 2023 04:46

jiazhihao added the inference Features and fixes related to the inference project. label Sep 5, 2023

xinhaoc and others added 3 commits September 7, 2023 14:39

fix

c758c9f

.

0b6b146

Merge branch 'inference' into update_rm_backup

709ce3c

jiazhihao mentioned this pull request Sep 8, 2023

One question about the measurement of the latency #1099

Closed

xinhaoc added 2 commits September 10, 2023 21:27

fix

683c283

fix

d44c1a1

jiazhihao reviewed Sep 11, 2023

View reviewed changes

include/flexflow/batch_config.h Show resolved Hide resolved

zwang86 and others added 8 commits September 10, 2023 23:23

fix.

35a33e5

Fix reloading new request with long prompts.

0d7524a

Fix edge cases.

7c8227d

Fix edge case

230e0bc

fix

9ed2684

try a fix to CI

87ef9cb

.

8898493

fix

e328e2d

zwang86 and others added 11 commits September 12, 2023 11:54

Merge branch 'inference' into update_rm_backup

960e938

Fix: clean up code and fix decoding_steps.

3a25189

Merge branch 'update_rm_backup' of https://github.com/flexflow/FlexFlow…

c66a205

… into update_rm_backup

try 1 try

c7f1b9e

fix: allow parse 0 tokens for pending request.

55eb913

format.

b88c4de

remove comment tests

abcf94f

Merge branch 'inference' into update_rm_backup

2327316

remove print.

66ee367

Merge branch 'update_rm_backup' of https://github.com/flexflow/FlexFlow…

8e4fe9a

… into update_rm_backup

Merge branch 'inference' into update_rm_backup

2769dcb

lockshaw mentioned this pull request Sep 21, 2023

Make MAX_BATCH_SIZE, MAX_NUM_TOKENS, MAX_SEQ_LENGTH user-provided input arguments #1018

Merged

xinhaoc and others added 7 commits September 24, 2023 12:04

Merge branch 'inference' into update_rm_backup

bf382b4

fix decoding steps

801c56c

.

1d18fce

quick fix.

aed8850

Merge branch 'inference' into update_rm_backup

6638cd3

remove debugging prints.

a39fb5b

fix store_beam_metadata.

84a6fba

zwang86 requested a review from jiazhihao September 25, 2023 05:02

hip

59acaeb

jiazhihao approved these changes Sep 25, 2023

View reviewed changes

jiazhihao enabled auto-merge (squash) September 25, 2023 15:53

jiazhihao disabled auto-merge September 25, 2023 15:53

jiazhihao merged commit 0a56d01 into inference Sep 25, 2023
38 of 39 checks passed

zwang86 deleted the update_rm_backup branch September 28, 2023 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SpecInfer] Update RequestManager #1096

[SpecInfer] Update RequestManager #1096

zwang86 commented Sep 5, 2023

zwang86 commented Sep 11, 2023 •

edited

Loading

zwang86 commented Sep 11, 2023

[SpecInfer] Update RequestManager #1096

[SpecInfer] Update RequestManager #1096

Conversation

zwang86 commented Sep 5, 2023

zwang86 commented Sep 11, 2023 • edited Loading

zwang86 commented Sep 11, 2023

zwang86 commented Sep 11, 2023 •

edited

Loading