-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add qwen-moe batch1 to nightly perf (#11369)
* add moe * reduce 437 models * rename * fix syntax * add moe check result * add 430 + 437 * all modes * 4-37-4 exclud * revert & comment --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
- Loading branch information
1 parent
769728c
commit c0e86c5
Showing
2 changed files
with
34 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
repo_id: | ||
- 'Qwen/Qwen1.5-MoE-A2.7B-Chat' | ||
local_model_hub: '/mnt/disk1/models' | ||
warm_up: 1 | ||
num_trials: 3 | ||
num_beams: 1 # default to greedy search | ||
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4) | ||
batch_size: 1 # default to 1 | ||
in_out_pairs: | ||
- '32-32' | ||
- '1024-128' | ||
- '2048-256' | ||
test_api: | ||
- "transformer_int4_fp16_gpu" # on Intel GPU | ||
cpu_embedding: False # whether put embedding to CPU (only avaiable now for gpu win related test_api) | ||
task: 'continuation' # task can be 'continuation', 'QA' and 'summarize' |