add force_greedy_sample #704

jikunshang · 2025-01-20T08:53:57Z

No description provided.

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com>

…roject#630) ``` VLLM_TP_SPLIT_SIZE_BY_SEQ=2 VLLM_TP_SPLIT_SIZE_BY_BATCH=2 ``` Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

https://jira.habana-labs.com/browse/SW-212933 ``` VLLM_PIPELINED_PA=true \ VLLM_SOFTMAX_CONST_NORM=true ``` Signed-off-by: Chendi Xue <chendi.xue@intel.com>

Enable splitting qkv and gate_up into separate columnparallellinear layers, instead of overwriting qkvparallellinear and mergedcolumnparallellinear layers. --------- Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Barak Goldberg <bgoldberg@habana.ai> Co-authored-by: Nir David <ndavid@habana.ai>

Signed-off-by: Kunshang <kunshangli@habana.ai>

xuechendi and others added 7 commits December 16, 2024 19:30

add const_norm option

2576619

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

Enable TP splitting on seq_len

d6bdc90

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com>

fix for FP8 (vllm-project#639)

e2ea481

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

update vllm-hpu-extention to const_pipeline_pa (vllm-project#655)

291a1c4

https://jira.habana-labs.com/browse/SW-212933 ``` VLLM_PIPELINED_PA=true \ VLLM_SOFTMAX_CONST_NORM=true ``` Signed-off-by: Chendi Xue <chendi.xue@intel.com>

add force_greedy_sample

aba2725

Signed-off-by: Kunshang <kunshangli@habana.ai>

jikunshang requested review from kzawora-intel, madamczykhabana, michalkuligowski and mgawarkiewicz as code owners January 20, 2025 08:53

sample type use greedy when top_k=1

ea78e8f

xuechendi force-pushed the mlperf_features branch from 393c817 to 3ad6cf3 Compare January 22, 2025 23:27

xuechendi requested review from vivekgoe and afierka-intel as code owners January 22, 2025 23:27

warmup use temperature =0

7b42731

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add force_greedy_sample #704

add force_greedy_sample #704

jikunshang commented Jan 20, 2025 •

edited by github-actions bot

Loading

add force_greedy_sample #704

Are you sure you want to change the base?

add force_greedy_sample #704

Conversation

jikunshang commented Jan 20, 2025 • edited by github-actions bot Loading

jikunshang commented Jan 20, 2025 •

edited by github-actions bot

Loading