Skip to content

Commit

Permalink
rollback changes
Browse files Browse the repository at this point in the history
  • Loading branch information
nsosio committed Apr 29, 2024
1 parent 0c43adc commit 7c63247
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 42 deletions.
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,22 +32,22 @@
Take a first glance of Llama-2-7B Model Performance Metrics Across Different Precision and Inference Engines. Metric used: `tokens/sec`


| Engine | float32 | float16 | int8 | int4 |
| ------------------------------------------ | ------------- | ------------- | ------------- | -------------- |
| [candle](/bench_candle/) | - | 36.78 ± 2.17 | - | - |
| [llama.cpp](/bench_llamacpp/) | - | - | 79.15 ± 1.20 | 100.90 ± 1.46 |
| [ctranslate](/bench_ctranslate/) | 35.23 ± 4.01 | 55.72 ± 16.66 | 35.73 ± 10.87 | - |
| [onnx](/bench_onnxruntime/) | - | 54.16 ± 3.15 | - | - |
| [transformers (pytorch)](/bench_pytorch/) | 43.79 ± 0.61 | 46.39 ± 0.28 | 6.98 ± 0.05 | 21.72 ± 0.11 |
| [vllm](/bench_vllm/) | 90.78 ± 1.60 | 90.54 ± 2.22 | - | 114.69 ± 11.20 |
| [exllamav2](/bench_exllamav2/) | - | - | 121.63 ± 0.74 | 130.16 ± 0.35 |
| [ctransformers](/bench_ctransformers/) | - | - | 76.75 ± 10.36 | 84.26 ± 5.79 |
| [AutoGPTQ](/bench_autogptq/) | 42.01 ± 1.03 | 30.24 ± 0.41 | - | - |
| [AutoAWQ](/bench_autoawq/) | - | - | - | 109.20 ± 3.28 |
| [DeepSpeed](/bench_deepspeed/) | - | 81.44 ± 8.13 | - | |
| [PyTorch Lightning](/bench_lightning/) | 24.85 ± 0.07 | 44.56 ± 2.89 | 10.50 ± 0.12 | 24.83 ± 0.05 |
| [Optimum Nvidia](/bench_optimum_nvidia/) | 110.36 ± 0.52 | 109.09 ± 4.26 | - | - |
| [Nvidia TensorRT-LLM](/bench_tensorrtllm/) | 55.19 ± 1.03 | 85.03 ± 0.62 | 167.66 ± 2.05 | 235.18 ± 3.20 |
| Engine | float32 | float16 | int8 | int4 |
|---------------------------------------------|--------------|----------------|---------------|---------------|
| [candle](/bench_candle/) | - | 36.78 ± 2.17 | - | - |
| [llama.cpp](/bench_llamacpp/) | - | - | 79.15 ± 1.20 | 100.90 ± 1.46 |
| [ctranslate](/bench_ctranslate/) | 35.23 ± 4.01 | 55.72 ± 16.66 | 35.73 ± 10.87 | - |
| [onnx](/bench_onnxruntime/) | - | 54.16 ± 3.15 | - | - |
| [transformers (pytorch)](/bench_pytorch/) | 43.79 ± 0.61 | 46.39 ± 0.28 | 6.98 ± 0.05 | 21.72 ± 0.11 |
| [vllm](/bench_vllm/) | 90.78 ± 1.60 | 90.54 ± 2.22 | - | 114.69 ± 11.20|
| [exllamav2](/bench_exllamav2/) | - | - | 121.63 ± 0.74 | 130.16 ± 0.35 |
| [ctransformers](/bench_ctransformers/) | - | - | 76.75 ± 10.36 | 84.26 ± 5.79 |
| [AutoGPTQ](/bench_autogptq/) | 42.01 ± 1.03 | 30.24 ± 0.41 | - | - |
| [AutoAWQ](/bench_autoawq/) | - | - | - | 109.20 ± 3.28 |
| [DeepSpeed](/bench_deepspeed/) | - | 81.44 ± 8.13 | - | |
| [PyTorch Lightning](/bench_lightning/) | 24.85 ± 0.07 | 44.56 ± 2.89 | 10.50 ± 0.12 | 24.83 ± 0.05 |
| [Optimum Nvidia](/bench_optimum_nvidia/) | 110.36 ± 0.52| 109.09 ± 4.26 | - | - |
| [Nvidia TensorRT-LLM](/bench_tensorrtllm/) | 55.19 ± 1.03 | 85.03 ± 0.62 | 167.66 ± 2.05 | 235.18 ± 3.20 |

*(Data updated: `05th April 2024`)

Expand Down
52 changes: 26 additions & 26 deletions docs/llama2.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,22 @@

**Performance Metrics:** (unit: Tokens / second)

| Engine | float32 | float16 | int8 | int4 |
| ------------------------------------------ | ------------- | ------------- | ------------- | -------------- |
| [candle](/bench_candle/) | - | 36.78 ± 2.17 | - | - |
| [llama.cpp](/bench_llamacpp/) | - | - | 79.15 ± 1.20 | 100.90 ± 1.46 |
| [ctranslate](/bench_ctranslate/) | 35.23 ± 4.01 | 55.72 ± 16.66 | 35.73 ± 10.87 | - |
| [onnx](/bench_onnxruntime/) | - | 54.16 ± 3.15 | - | - |
| [transformers (pytorch)](/bench_pytorch/) | 43.79 ± 0.61 | 46.39 ± 0.28 | 6.98 ± 0.05 | 21.72 ± 0.11 |
| [vllm](/bench_vllm/) | 90.78 ± 1.60 | 90.54 ± 2.22 | - | 114.69 ± 11.20 |
| [exllamav2](/bench_exllamav2/) | - | - | 121.63 ± 0.74 | 130.16 ± 0.35 |
| [ctransformers](/bench_ctransformers/) | - | - | 76.75 ± 10.36 | 84.26 ± 5.79 |
| [AutoGPTQ](/bench_autogptq/) | 42.01 ± 1.03 | 30.24 ± 0.41 | - | - |
| [AutoAWQ](/bench_autoawq/) | - | - | - | 109.20 ± 3.28 |
| [DeepSpeed](/bench_deepspeed/) | - | 81.44 ± 8.13 | - | |
| [PyTorch Lightning](/bench_lightning/) | 24.85 ± 0.07 | 44.56 ± 2.89 | 10.50 ± 0.12 | 24.83 ± 0.05 |
| [Optimum Nvidia](/bench_optimum_nvidia/) | 110.36 ± 0.52 | 109.09 ± 4.26 | - | - |
| [Nvidia TensorRT-LLM](/bench_tensorrtllm/) | 55.19 ± 1.03 | 85.03 ± 0.62 | 167.66 ± 2.05 | 235.18 ± 3.20 |
| Engine | float32 | float16 | int8 | int4 |
|---------------------------------------------|--------------|----------------|---------------|---------------|
| [candle](/bench_candle/) | - | 36.78 ± 2.17 | - | - |
| [llama.cpp](/bench_llamacpp/) | - | - | 79.15 ± 1.20 | 100.90 ± 1.46 |
| [ctranslate](/bench_ctranslate/) | 35.23 ± 4.01 | 55.72 ± 16.66 | 35.73 ± 10.87 | - |
| [onnx](/bench_onnxruntime/) | - | 54.16 ± 3.15 | - | - |
| [transformers (pytorch)](/bench_pytorch/) | 43.79 ± 0.61 | 46.39 ± 0.28 | 6.98 ± 0.05 | 21.72 ± 0.11 |
| [vllm](/bench_vllm/) | 90.78 ± 1.60 | 90.54 ± 2.22 | - | 114.69 ± 11.20|
| [exllamav2](/bench_exllamav2/) | - | - | 121.63 ± 0.74 | 130.16 ± 0.35 |
| [ctransformers](/bench_ctransformers/) | - | - | 76.75 ± 10.36 | 84.26 ± 5.79 |
| [AutoGPTQ](/bench_autogptq/) | 42.01 ± 1.03 | 30.24 ± 0.41 | - | - |
| [AutoAWQ](/bench_autoawq/) | - | - | - | 109.20 ± 3.28 |
| [DeepSpeed](/bench_deepspeed/) | - | 81.44 ± 8.13 | - | |
| [PyTorch Lightning](/bench_lightning/) | 24.85 ± 0.07 | 44.56 ± 2.89 | 10.50 ± 0.12 | 24.83 ± 0.05 |
| [Optimum Nvidia](/bench_optimum_nvidia/) | 110.36 ± 0.52| 109.09 ± 4.26 | - | - |
| [Nvidia TensorRT-LLM](/bench_tensorrtllm/) | 55.19 ± 1.03 | 85.03 ± 0.62 | 167.66 ± 2.05 | 235.18 ± 3.20 |

*(Data updated: `05th April 2024`)

Expand All @@ -39,22 +39,22 @@
- Command: `./benchmark.sh --repetitions 10 --max_tokens 512 --device cpu --prompt 'Write an essay about the transformer model architecture'`

**Performance Metrics:** (unit: Tokens / second)
| Engine | float32 | float16 | int8 | int4 |
| -------------------------------------- | ------- | ----------- | ------------ | ------------ |
| [candle](/bench_candle/) | - | 3.43 ± 0.02 | - | - |
| [llama.cpp](/bench_llamacpp/) | - | - | 13.24 ± 0.62 | 21.43 ± 0.47 |
| [ctranslate](/bench_ctranslate/) | - | - | 1.87 ± 0.14 | - |
| [ctransformers](/bench_ctransformers/) | - | - | 13.50 ± 0.48 | 20.57 ± 2.50 |
| Engine | float32 | float16 | int8 | int4 |
|----------------------------------------|--------------|--------------|--------------|--------------|
| [candle](/bench_candle/) | - | 3.43 ± 0.02 | - | - |
| [llama.cpp](/bench_llamacpp/) | - | - | 13.24 ± 0.62 | 21.43 ± 0.47 |
| [ctranslate](/bench_ctranslate/) | - | - | 1.87 ± 0.14 | - |
| [ctransformers](/bench_ctransformers/) | - | - | 13.50 ± 0.48 | 20.57 ± 2.50 |


### GPU (Metal)

**Command:** `./benchmark.sh --repetitions 10 --max_tokens 512 --device metal --prompt 'Write an essay about the transformer model architecture'`

**Performance Metrics:** (unit: Tokens / second)
| Engine | float32 | float16 | int8 | int4 |
| -------------------------------------- | ------- | ------- | ------------ | ------------ |
| [llama.cpp](/bench_llamacpp/) | - | - | 30.11 ± 0.45 | 44.27 ± 0.12 |
| [ctransformers](/bench_ctransformers/) | - | - | 20.75 ± 0.36 | 34.04 ± 2.11 |
| Engine | float32 | float16 | int8 | int4 |
|-----------------------------------------|--------------|---------------|--------------|--------------|
| [llama.cpp](/bench_llamacpp/) | - | - | 30.11 ± 0.45 | 44.27 ± 0.12 |
| [ctransformers](/bench_ctransformers/) | - | - | 20.75 ± 0.36 | 34.04 ± 2.11 |

*(Data updated: `05th April 2024`)

0 comments on commit 7c63247

Please sign in to comment.