Inference: fix batch_size issue. #863

xinhaoc · 2023-07-18T04:33:16Z

Description of changes:

Related Issues:

Linked Issues:

Issue error when I use different batch size for opt inference #846

Issues closed by this PR:

Closes error when I use different batch size for opt inference #846

Before merging:

Did you update the flexflow-third-party repo, if modifying any of the Cmake files, the build configs, or the submodules?

xinhaoc · 2023-07-18T04:39:14Z

@lambda7xx I think our system is correct when using batch_size > 2, something I want to share about that

we don't need to change the dimension when create the input tensor, my fault.
before running the system with batch_size > 2. we should modify the prompt/test.json, an example is like

["Give three tips for staying healthy.", "Carnegie Mellon Univerity is located in Pittsburgh", "My favorite basketball palyer is Kobe Bryant"]

let's fix any following issue you may encounter during the evaluation in this branch.

jiazhihao · 2023-07-18T13:53:54Z

I am thinking about a more general fix where we make MAX_NUM_REQUESTS, MAX_NUM_TOKENS, and MAX_SEQ_LENGTH input arguments instead of static variables. Following is the Python interface Gabriele and I discussed

from flexflow.serve import LLM, SamplingConfig

llama = LLM.model("decapoda-research/llama-30b-hf", data_type = "half")
ssm1 = LLM.model("Jackframe/llama-160m", data_type = "half")
ssm2 = LLM.model("Jackframe/opt-160m", data_type = "half")

sampling_config = SamplingConfig(temperature = 0.9, topp = 0.8, topk = 1)

LLM.compile(llama, max_parallel_requests = xxx, max_parallel_tokens = yyy, max_seq_length = yyy, tensor_parallel_degree = 4, pipeline_parallel_degree = 2, ssms = {ssm1, ssm2})

result = llama.generate("What's the best xxx in yyy?", sampling = sampling_config)

xinhaoc · 2023-07-18T13:58:41Z

Yes, that's a good idea.

run concurrently

ade54f9

Merge branch 'inference' into fix_batch_size

5c2e9ae

xinhaoc requested a review from lambda7xx July 18, 2023 04:46

jiazhihao added the inference Features and fixes related to the inference project. label Jul 18, 2023

xinhaoc and others added 2 commits July 18, 2023 23:17

Merge branch 'inference' into fix_batch_size

3d8af13

Merge branch 'inference' into fix_batch_size

bcd88f3

goliaro merged commit 2ba481b into flexflow:inference Jul 21, 2023
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference: fix batch_size issue. #863

Inference: fix batch_size issue. #863

xinhaoc commented Jul 18, 2023 •

edited

Loading

xinhaoc commented Jul 18, 2023 •

edited

Loading

jiazhihao commented Jul 18, 2023

xinhaoc commented Jul 18, 2023

Inference: fix batch_size issue. #863

Inference: fix batch_size issue. #863

Conversation

xinhaoc commented Jul 18, 2023 • edited Loading

xinhaoc commented Jul 18, 2023 • edited Loading

jiazhihao commented Jul 18, 2023

xinhaoc commented Jul 18, 2023

xinhaoc commented Jul 18, 2023 •

edited

Loading

xinhaoc commented Jul 18, 2023 •

edited

Loading