Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor llm perf backend handling #258

Closed
wants to merge 79 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
92a5cea
add intel pytorch ort and openvino to leaderboard
Aug 21, 2024
0168063
add intel pytorch ort and openvino to leaderboard
baptistecolle Aug 21, 2024
0bc416f
Add support for intel in leaderboard
baptistecolle Aug 21, 2024
85f62e6
Update update_llm_perf_intel_pytorch.yml
baptistecolle Aug 21, 2024
7151e01
Update update_llm_perf_intel_pytorch.yml
baptistecolle Aug 21, 2024
4afc529
Merge branch 'add-intel-hardware-to-leaderboard' into intel-leaderboard
baptistecolle Aug 22, 2024
c92f818
add new llm_perf_tests
baptistecolle Aug 22, 2024
c31e6cf
fix workflow
baptistecolle Aug 22, 2024
d406440
fix failing tests
baptistecolle Aug 22, 2024
20b96b2
fix failing tests
baptistecolle Aug 22, 2024
c7e0ec0
fix failing tests
baptistecolle Aug 22, 2024
6d7bf69
fix failing tests
baptistecolle Aug 22, 2024
7048df5
refractoring
baptistecolle Sep 2, 2024
db88b2a
intel with multiple backends
baptistecolle Sep 2, 2024
1246d28
parallelize intel llm-perf
baptistecolle Sep 2, 2024
2d6830e
parallelize intel llm-perf
baptistecolle Sep 2, 2024
801c5bf
parallelize intel llm-perf
baptistecolle Sep 2, 2024
2e9526c
parallelize intel llm-perf
baptistecolle Sep 2, 2024
6d87d31
parallelize intel llm-perf
baptistecolle Sep 2, 2024
62266a6
parallelize intel llm-perf
baptistecolle Sep 2, 2024
0a39667
parallelize intel llm-perf
baptistecolle Sep 3, 2024
caf7b67
parallelize intel llm-perf
baptistecolle Sep 3, 2024
5890457
parallelize intel llm-perf
baptistecolle Sep 3, 2024
50bd1a2
update leaderboard collection to support more hardware
baptistecolle Sep 4, 2024
f93cc7c
update leaderboard collection to support more hardware
baptistecolle Sep 4, 2024
2fad593
update leaderboard collection to support more hardware
baptistecolle Sep 4, 2024
6f2885c
update leaderboard collection to support more hardware
baptistecolle Sep 4, 2024
a59e554
update leaderboard collection to support more hardware
baptistecolle Sep 4, 2024
a193748
update leaderboard collection to support more hardware
baptistecolle Sep 4, 2024
31f1ff6
update leaderboard collection to support more hardware
baptistecolle Sep 4, 2024
9d88f5a
Merge branch 'main' into intel-leaderboard
IlyasMoutawwakil Sep 5, 2024
0f041fb
add new workflow
baptistecolle Sep 5, 2024
b2330b0
add new workflow
baptistecolle Sep 5, 2024
2f54e2d
add new workflow
baptistecolle Sep 5, 2024
8603b68
add new workflow
baptistecolle Sep 5, 2024
ec829cb
add new workflow
baptistecolle Sep 5, 2024
d152c81
add new workflow
baptistecolle Sep 5, 2024
9730b0b
add new workflow
baptistecolle Sep 5, 2024
6e9d33c
add new workflow
baptistecolle Sep 5, 2024
540af0a
add new workflow
baptistecolle Sep 5, 2024
452e4b0
add new workflow
baptistecolle Sep 5, 2024
b25d6e1
add new workflow
baptistecolle Sep 6, 2024
a76e56d
add new workflow
baptistecolle Sep 6, 2024
6677def
add new workflow
baptistecolle Sep 6, 2024
6593487
add new workflow
baptistecolle Sep 6, 2024
9802c95
add new workflow
baptistecolle Sep 6, 2024
b6b947f
add new workflow
baptistecolle Sep 6, 2024
7a891c1
add new workflow
baptistecolle Sep 6, 2024
a6f289b
remove intel reference
baptistecolle Sep 10, 2024
e97ee56
remove intel reference
baptistecolle Sep 10, 2024
f5f0eeb
remove intel reference
baptistecolle Sep 10, 2024
55e2c69
refractoring done
baptistecolle Sep 10, 2024
ae7b939
refractoring done
baptistecolle Sep 10, 2024
5c80cad
refractoring done
baptistecolle Sep 10, 2024
e75a361
refractoring done
baptistecolle Sep 10, 2024
07d1d32
refractoring done
baptistecolle Sep 10, 2024
34f958f
refractoring done
baptistecolle Sep 10, 2024
35dc1cf
refractoring done
baptistecolle Sep 10, 2024
9348515
refractoring done
baptistecolle Sep 10, 2024
8b28005
refractoring done
baptistecolle Sep 10, 2024
7cb3ea0
remove push on workflow used for debugging
baptistecolle Sep 10, 2024
e20ac80
Merge branch 'main' into refactor-llm-perf-backend-handling
baptistecolle Sep 12, 2024
c4c8887
refractor pytorch cpu
baptistecolle Sep 12, 2024
32626f9
refractor pytorch cpu
baptistecolle Sep 12, 2024
99a00df
refractor pytorch cpu
baptistecolle Sep 12, 2024
b27f806
fix failling workflow
baptistecolle Sep 12, 2024
10c47ea
fix broken canonical list
baptistecolle Sep 17, 2024
60aa33e
fix broken canonical list
baptistecolle Sep 17, 2024
842645e
Merge branch 'fix-broken-canonical-list' into refactor-llm-perf-backe…
baptistecolle Sep 17, 2024
d0804a7
Merge branch 'main' into refactor-llm-perf-backend-handling
baptistecolle Sep 20, 2024
f3bc069
merge main
baptistecolle Sep 20, 2024
602a9d0
merge main into branch
baptistecolle Sep 23, 2024
b2d5f12
merge main into branch
baptistecolle Sep 23, 2024
08f70e2
merge main into branch
baptistecolle Sep 23, 2024
ab1710a
merge main into branch
baptistecolle Sep 23, 2024
2512827
add new label system
baptistecolle Sep 23, 2024
defc78a
add new label system
baptistecolle Sep 23, 2024
89b6a97
add new chnages from review
baptistecolle Sep 23, 2024
3130c87
add new chnages from review
baptistecolle Sep 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions .github/workflows/update_llm_perf_cpu_pytorch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,32 @@ on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
push:
branches:
- main
pull_request:
branches:
- main
types:
- opened
- reopened
- synchronize
- labeled
- unlabeled

concurrency:
cancel-in-progress: true
group: ${{ github.workflow }}-${{ github.ref }}
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

env:
IMAGE: ghcr.io/huggingface/optimum-benchmark:latest-cpu

jobs:
run_benchmarks:
if: ${{
(github.event_name == 'push') ||
(github.event_name == 'workflow_dispatch') ||
contains( github.event.pull_request.labels.*.name, 'leaderboard')}}
Comment on lines +29 to +32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably add more specifications here to be able to run specific benchmarks, like cuda/cpu
didn't try it, but you might also be able to add conditions on matrix arguments, like || contains( github.event.pull_request.labels.*.name, matrix.subset)}} to run specific subsets or specific machines

strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -49,4 +65,4 @@ jobs:
pip install packaging && pip install einops scipy optimum codecarbon
pip install -U transformers huggingface_hub[hf_transfer]
pip install -e .
python llm_perf/update_llm_perf_cpu_pytorch.py
python llm_perf/benchmark_runners/update_llm_perf_cpu_pytorch.py
21 changes: 19 additions & 2 deletions .github/workflows/update_llm_perf_cuda_pytorch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,33 @@ on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
push:
branches:
- main
pull_request:
branches:
- main
types:
- opened
- reopened
- synchronize
- labeled
- unlabeled

concurrency:
cancel-in-progress: true
group: ${{ github.workflow }}-${{ github.ref }}
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

env:
IMAGE: ghcr.io/huggingface/optimum-benchmark:latest-cuda

jobs:
run_benchmarks:
if: ${{
(github.event_name == 'push') ||
(github.event_name == 'workflow_dispatch') ||
contains( github.event.pull_request.labels.*.name, 'leaderboard')}}

strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -54,4 +71,4 @@ jobs:
pip install packaging && pip install flash-attn einops scipy auto-gptq optimum bitsandbytes autoawq codecarbon
pip install -U transformers huggingface_hub[hf_transfer]
pip install -e .
python llm_perf/update_llm_perf_cuda_pytorch.py
python llm_perf/benchmark_runners/update_llm_perf_cuda_pytorch.py
19 changes: 18 additions & 1 deletion .github/workflows/update_llm_perf_leaderboard.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,30 @@ on:
workflow_dispatch:
schedule:
- cron: "0 */6 * * *"
push:
branches:
- main
pull_request:
branches:
- main
types:
- opened
- reopened
- synchronize
- labeled
- unlabeled

concurrency:
cancel-in-progress: true
group: ${{ github.workflow }}-${{ github.ref }}
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

jobs:
update_llm_perf_leaderboard:
if: ${{
(github.event_name == 'push') ||
(github.event_name == 'workflow_dispatch') ||
contains( github.event.pull_request.labels.*.name, 'leaderboard')}}

runs-on: ubuntu-latest
steps:
- name: Checkout
Expand Down
92 changes: 92 additions & 0 deletions llm_perf/benchmark_runners/update_llm_perf_cpu_pytorch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
from itertools import product
from typing import Any, Dict, List

from llm_perf.common.benchmark_runner import LLMPerfBenchmarkManager
from llm_perf.common.utils import CANONICAL_PRETRAINED_OPEN_LLM_LIST, GENERATE_KWARGS, INPUT_SHAPES
from optimum_benchmark import PyTorchConfig
from optimum_benchmark.benchmark.config import BenchmarkConfig
from optimum_benchmark.launchers.process.config import ProcessConfig
from optimum_benchmark.scenarios.inference.config import InferenceConfig


class CPUPyTorchBenchmarkRunner(LLMPerfBenchmarkManager):
def __init__(self):
super().__init__(backend="pytorch", device="cpu")

self.attention_configs = self._get_attention_configs()
assert self.subset is not None, "SUBSET environment variable must be set for benchmarking"
self.weights_configs = self._get_weights_configs(self.subset)

def get_list_of_benchmarks_to_run(self) -> List[Dict[str, Any]]:
return [
{"model": model, "attn_implementation": attn_impl, "weights_config": weights_cfg}
for model, attn_impl, weights_cfg in product(
CANONICAL_PRETRAINED_OPEN_LLM_LIST, self.attention_configs, self.weights_configs.keys()
)
]

def get_benchmark_name(self, model: str, **kwargs) -> str:
weights_config = kwargs["weights_config"]
attn_implementation = kwargs["attn_implementation"]
return f"{model}-{weights_config}-{attn_implementation}"

def get_benchmark_config(self, model: str, **kwargs) -> BenchmarkConfig:
weights_config = kwargs["weights_config"]
attn_implementation = kwargs["attn_implementation"]

assert (
weights_config in self.weights_configs
), f"your config does not contain {weights_config}, adjust your _get_weights_configs to fix this issue"

torch_dtype = self.weights_configs[weights_config]["torch_dtype"]
quant_scheme = self.weights_configs[weights_config]["quant_scheme"]
quant_config = self.weights_configs[weights_config]["quant_config"]

launcher_config = ProcessConfig()
scenario_config = InferenceConfig(
memory=True,
energy=True,
latency=True,
duration=10,
iterations=10,
warmup_runs=10,
input_shapes=INPUT_SHAPES,
generate_kwargs=GENERATE_KWARGS,
)
backend_config = PyTorchConfig(
model=model,
device="cpu",
no_weights=True,
library="transformers",
task="text-generation",
torch_dtype=torch_dtype,
quantization_scheme=quant_scheme,
quantization_config=quant_config,
attn_implementation=attn_implementation,
model_kwargs={"trust_remote_code": True},
)

return BenchmarkConfig(
name=f"{weights_config}-{attn_implementation}",
scenario=scenario_config,
launcher=launcher_config,
backend=backend_config,
)

def _get_weights_configs(self, subset) -> Dict[str, Dict[str, Any]]:
if subset == "unquantized":
return {
"float32": {"torch_dtype": "float32", "quant_scheme": None, "quant_config": {}},
"float16": {"torch_dtype": "float16", "quant_scheme": None, "quant_config": {}},
"bfloat16": {"torch_dtype": "bfloat16", "quant_scheme": None, "quant_config": {}},
}
else:
raise ValueError(f"Unknown subset: {subset}")

def _get_attention_configs(self) -> List[str]:
return ["eager", "sdpa"]


if __name__ == "__main__":
runner = CPUPyTorchBenchmarkRunner()
runner.run_benchmarks()
147 changes: 147 additions & 0 deletions llm_perf/benchmark_runners/update_llm_perf_cuda_pytorch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
from itertools import product
from typing import Any, Dict, List

from llm_perf.common.benchmark_runner import LLMPerfBenchmarkManager
from llm_perf.common.utils import CANONICAL_PRETRAINED_OPEN_LLM_LIST, GENERATE_KWARGS, INPUT_SHAPES
from optimum_benchmark import PyTorchConfig
from optimum_benchmark.benchmark.config import BenchmarkConfig
from optimum_benchmark.launchers.process.config import ProcessConfig
from optimum_benchmark.scenarios.inference.config import InferenceConfig


class CUDAPyTorchBenchmarkRunner(LLMPerfBenchmarkManager):
def __init__(self):
super().__init__(backend="pytorch", device="cuda")

self.attention_configs = self._get_attention_configs()
assert self.subset is not None, "SUBSET environment variable must be set for benchmarking"
self.weights_configs = self._get_weights_configs(self.subset)

def get_list_of_benchmarks_to_run(self) -> List[Dict[str, Any]]:
return [
{"model": model, "attn_implementation": attn_impl, "weights_config": weights_cfg}
for model, attn_impl, weights_cfg in product(
CANONICAL_PRETRAINED_OPEN_LLM_LIST, self.attention_configs, self.weights_configs.keys()
)
]

def get_benchmark_name(self, model: str, **kwargs) -> str:
weights_config = kwargs["weights_config"]
attn_implementation = kwargs["attn_implementation"]
return f"{model}-{weights_config}-{attn_implementation}"

def is_benchmark_supported(self, **kwargs) -> bool:
if kwargs["attn_implementation"] == "flash_attention_2" and kwargs["weights_config"] == "float32":
return False
return True

def get_benchmark_config(self, model: str, **kwargs) -> BenchmarkConfig:
weights_config = kwargs["weights_config"]
attn_implementation = kwargs["attn_implementation"]

assert (
weights_config in self.weights_configs
), f"your config does contains the {weights_config}, adjust your _get_weights_configs to fix this issue"

torch_dtype = self.weights_configs[weights_config]["torch_dtype"]
quant_scheme = self.weights_configs[weights_config]["quant_scheme"]
quant_config = self.weights_configs[weights_config]["quant_config"]

launcher_config = ProcessConfig(device_isolation=True, device_isolation_action="kill")
scenario_config = InferenceConfig(
memory=True,
energy=True,
latency=True,
duration=10,
iterations=10,
warmup_runs=10,
input_shapes=INPUT_SHAPES,
generate_kwargs=GENERATE_KWARGS,
)
backend_config = PyTorchConfig(
model=model,
device="cuda",
device_ids="0",
no_weights=True,
library="transformers",
task="text-generation",
torch_dtype=torch_dtype,
quantization_scheme=quant_scheme,
quantization_config=quant_config,
attn_implementation=attn_implementation,
model_kwargs={"trust_remote_code": True},
)

return BenchmarkConfig(
name=f"{weights_config}-{attn_implementation}",
scenario=scenario_config,
launcher=launcher_config,
backend=backend_config,
)

def _get_weights_configs(self, subset) -> Dict[str, Dict[str, Any]]:
if subset == "unquantized":
return {
"float32": {"torch_dtype": "float32", "quant_scheme": None, "quant_config": {}},
"float16": {"torch_dtype": "float16", "quant_scheme": None, "quant_config": {}},
"bfloat16": {"torch_dtype": "bfloat16", "quant_scheme": None, "quant_config": {}},
}
elif subset == "bnb":
return {
"4bit-bnb": {"torch_dtype": "float16", "quant_scheme": "bnb", "quant_config": {"load_in_4bit": True}},
"8bit-bnb": {"torch_dtype": "float16", "quant_scheme": "bnb", "quant_config": {"load_in_8bit": True}},
}
elif subset == "gptq":
return {
"4bit-gptq-exllama-v1": {
"torch_dtype": "float16",
"quant_scheme": "gptq",
"quant_config": {"bits": 4, "use_exllama ": True, "version": 1, "model_seqlen": 256},
},
"4bit-gptq-exllama-v2": {
"torch_dtype": "float16",
"quant_scheme": "gptq",
"quant_config": {"bits": 4, "use_exllama ": True, "version": 2, "model_seqlen": 256},
},
}
elif subset == "awq":
return {
"4bit-awq-gemm": {
"torch_dtype": "float16",
"quant_scheme": "awq",
"quant_config": {"bits": 4, "version": "gemm"},
},
"4bit-awq-gemv": {
"torch_dtype": "float16",
"quant_scheme": "awq",
"quant_config": {"bits": 4, "version": "gemv"},
},
"4bit-awq-exllama-v1": {
"torch_dtype": "float16",
"quant_scheme": "awq",
"quant_config": {
"bits": 4,
"version": "exllama",
"exllama_config": {"version": 1, "max_input_len": 64, "max_batch_size": 1},
},
},
"4bit-awq-exllama-v2": {
"torch_dtype": "float16",
"quant_scheme": "awq",
"quant_config": {
"bits": 4,
"version": "exllama",
"exllama_config": {"version": 2, "max_input_len": 64, "max_batch_size": 1},
},
},
}
else:
raise ValueError(f"Unknown subset: {subset}")

def _get_attention_configs(self) -> List[str]:
return ["eager", "sdpa", "flash_attention_2"]


if __name__ == "__main__":
runner = CUDAPyTorchBenchmarkRunner()
runner.run_benchmarks()
Empty file added llm_perf/common/__init__.py
Empty file.
Loading
Loading