Skip to content

Commit

Permalink
merged master
Browse files Browse the repository at this point in the history
  • Loading branch information
nsosio committed Nov 23, 2023
2 parents 7197a71 + d1520a8 commit e604256
Show file tree
Hide file tree
Showing 57 changed files with 1,890 additions and 6,083 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/precommit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ name: pre-commit
on:
pull_request:
branches: [main]
push:
branches: [main]

jobs:
pre-commit:
Expand Down
30 changes: 30 additions & 0 deletions .github/workflows/update_readme.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Update README

on:
push:
branches: ["main"]
paths:
- README.md.template

jobs:
update-readme:
runs-on: ubuntu-latest
steps:
- name: Checkout Code Repository
uses: actions/checkout@v3

- name: Update README
run: sed "s|<LAST_UPDATE>|$(date -u +"%dth %B %Y")|g" README.md.template > README.md

- name: Commit changes
run: |
git config --global user.email "actions@github.com"
git config --global user.name "GitHub Actions"
git add README.md
git commit -m "Update <LAST_UPDATE> placeholder in README.md" || true
- name: Push changes
uses: ad-m/github-push-action@master
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
branch: ${{ github.ref }}
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,3 +162,7 @@ cython_debug/
# don't check-in sub folder
models/*
!models/.gitkeep

# Repositories
bench_tinygrad/tinygrad
bench_burn/llama2-burn
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ repos:
args: ["--config=setup.cfg"]
additional_dependencies: [flake8-isort]

- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.9.0.6
hooks:
- id: shellcheck

ci:
autoupdate_schedule: weekly
skip: []
Expand Down
76 changes: 37 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,46 @@
# benchmarks
MLOps Engines, Frameworks, and Languages benchmarks over main stream AI Models.

## Tool
## Structure

The benchmarking tool comprises three main scripts:
- `benchmark.sh` for running the end-to-end benchmarking
- `download.sh` which is internally used by the benchmark script to download the needed model files based on a configuration
- `setup.sh` script for setup of dependencies and needed formats conversion
The repository is organized to facilitate benchmark management and execution through a consistent structure:

### benchmark
- Each benchmark, identified as `bench_name`, has a dedicated folder, `bench_{bench_name}`.
- Within these benchmark folders, a common script named `bench.sh` handles setup, environment configuration, and execution.

This script runs benchmarks for a transformer model using both Rust and Python implementations. It provides options to customize the benchmarks, such as the prompt, repetitions, maximum tokens, device, and NVIDIA flag.
### Benchmark Script

```bash
./benchmark.sh [OPTIONS]
```
where `OPTIONS`:
- `-p, --prompt`: Prompt for benchmarks (default: 'Explain what is a transformer')
- `-r, --repetitions`: Number of repetitions for benchmarks (default: 2)
- `-m, --max_tokens`: Maximum number of tokens for benchmarks (default: 100)
- `-d, --device`: Device for benchmarks (possible values: 'gpu' or 'cpu', default: 'cpu')
- `--nvidia`: Use NVIDIA for benchmarks (default: false)
The `bench.sh` script supports key parameters:

### download
- `prompt`: Benchmark-specific prompt.
- `max_tokens`: Maximum tokens for the benchmark.
- `repetitions`: Number of benchmark repetitions.
- `log_file`: File for storing benchmark logs.
- `device`: Device for benchmark execution (cpu, cuda, metal).
- `models_dir`: Directory containing necessary model files.

Downloads files from a list of URLs specified in a JSON file. The JSON file should contain an array of objects, each with a 'url', 'file', and 'folder' property. The script checks if the file already exists before downloading it.
### Unified Execution

```bash
./download.sh --models <json_file> --cache <cache_file> --force-download
```
Options
- `--models`: JSON file specifying the models to download (default: models.json)
- `--cache`: Cache file to keep track of downloaded files (default: cache.log)
- `--force-download`: Force download of all files, removing existing files and cache
An overarching `bench.sh` script streamlines benchmark execution:

- Downloads essential files for benchmarking.
- Iterates through all benchmark folders in the repository.

This empowers users to seamlessly execute benchmarks based on their preference. To run a specific benchmark, navigate to the corresponding benchmark folder (e.g., `bench_{bench_name}`) and execute the `bench.sh` script with the required parameters.

### setup
1. Creates a python virtual environment `venv` and installs project requirements.
3. Converts and stores models in different formats.


## Usage

```bash
./setup.sh
# Run a specific benchmark
./bench_{bench_name}/bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models>

# Run all benchmarks collectively
./bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models>
```


## ML Engines: Feature Table

| Features | pytorch | burn | llama.cpp | candle | tinygrad | onnxruntime | CTranslate2 |
Expand Down Expand Up @@ -74,16 +73,15 @@ CUDA Version: 11.7

Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --nvidia --prompt 'Explain what is a transformer'`

| Engine | float32 | float16 | int8 | int4 |
|-------------|--------------|--------------|--------------|--------------|
| burn | 13.28 Β± 0.79 | - | - | - |
| candle | - | 26.30 Β± 0.29 | - | - |
| llama.cpp | - | - | 67.64 Β± 22.57| 106.21 Β± 2.21|
| ctranslate | - | 58.54 Β± 13.24| 34.22 Β± 6.29 | - |
| tinygrad | - | 20.13 Β± 1.35 | - | - |
| onnx | - | 50.50 Β± 3.58 | - | - |
| Engine | float32 | float16 | int8 | int4 |
|-------------|--------------|---------------|---------------|---------------|
| burn | 13.12 Β± 0.85 | - | - | - |
| candle | - | 36.78 Β± 2.17 | - | - |
| llama.cpp | - | - | 84.48 Β± 3.76 | 106.76 Β± 1.29 |
| ctranslate | - | 51.38 Β± 16.01 | 36.12 Β± 11.93 | - |
| tinygrad | - | 20.32 Β± 0.06 | - | - |

*(data updated: 17th November 2023)
*(data updated: 23th November 2023)


### M2 MAX 32GB Inference Bench:
Expand Down Expand Up @@ -116,4 +114,4 @@ Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --prompt
| ctranslate | - | - | - | - |
| tinygrad | - | 29.78 Β± 1.18 | - | - |

*(data updated: 15th November 2023)
*(data updated: 23th November 2023)
117 changes: 117 additions & 0 deletions README.md.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# benchmarks
MLOps Engines, Frameworks, and Languages benchmarks over main stream AI Models.

## Structure

The repository is organized to facilitate benchmark management and execution through a consistent structure:

- Each benchmark, identified as `bench_name`, has a dedicated folder, `bench_{bench_name}`.
- Within these benchmark folders, a common script named `bench.sh` handles setup, environment configuration, and execution.

### Benchmark Script

The `bench.sh` script supports key parameters:

- `prompt`: Benchmark-specific prompt.
- `max_tokens`: Maximum tokens for the benchmark.
- `repetitions`: Number of benchmark repetitions.
- `log_file`: File for storing benchmark logs.
- `device`: Device for benchmark execution (cpu, cuda, metal).
- `models_dir`: Directory containing necessary model files.

### Unified Execution

An overarching `bench.sh` script streamlines benchmark execution:

- Downloads essential files for benchmarking.
- Iterates through all benchmark folders in the repository.

This empowers users to seamlessly execute benchmarks based on their preference. To run a specific benchmark, navigate to the corresponding benchmark folder (e.g., `bench_{bench_name}`) and execute the `bench.sh` script with the required parameters.



## Usage

```bash
# Run a specific benchmark
./bench_{bench_name}/bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models>

# Run all benchmarks collectively
./bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models>
```


## ML Engines: Feature Table

| Features | pytorch | burn | llama.cpp | candle | tinygrad | onnxruntime | CTranslate2 |
| --------------------------- | ------- | ---- | --------- | ------ | -------- | ----------- | ----------- |
| Inference support | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… |
| 16-bit quantization support | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… |
| 8-bit quantization support | βœ… | ❌ | βœ… | βœ… | βœ… | βœ… | βœ… |
| 4-bit quantization support | βœ… | ❌ | βœ… | βœ… | ❌ | ❌ | ❌ |
| 2/3bit quantization support | βœ… | ❌ | βœ… | βœ… | ❌ | ❌ | ❌ |
| CUDA support | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… |
| ROCM support | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ |
| Intel OneAPI/SYCL support | βœ…** | βœ… | βœ… | βœ… | βœ… | ❌ | ❌ |
| Mac M1/M2 support | βœ… | βœ… | βœ… | ⭐ | βœ… | βœ… | ⭐ |
| BLAS support(CPU) | βœ… | βœ… | βœ… | βœ… | ❌ | βœ… | βœ… |
| Model Parallel support | βœ… | ❌ | ❌ | βœ… | ❌ | ❌ | βœ… |
| Tensor Parallel support | βœ… | ❌ | ❌ | βœ… | ❌ | ❌ | βœ… |
| Onnx Format support | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | ❌ |
| Training support | βœ… | 🌟 | ❌ | 🌟 | ❌ | ❌ | ❌ |

⭐ = No Metal Support
🌟 = Partial Support for Training (Finetuning already works, but training from scratch may not work)

## Benchmarking ML Engines

### A100 80GB Inference Bench:

Model: LLAMA-2-7B

CUDA Version: 11.7

Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --nvidia --prompt 'Explain what is a transformer'`

| Engine | float32 | float16 | int8 | int4 |
|-------------|--------------|---------------|---------------|---------------|
| burn | 13.12 Β± 0.85 | - | - | - |
| candle | - | 36.78 Β± 2.17 | - | - |
| llama.cpp | - | - | 84.48 Β± 3.76 | 106.76 Β± 1.29 |
| ctranslate | - | 51.38 Β± 16.01 | 36.12 Β± 11.93 | - |
| tinygrad | - | 20.32 Β± 0.06 | - | - |

*(data updated: <LAST_UPDATE>)


### M2 MAX 32GB Inference Bench:

#### CPU

Model: LLAMA-2-7B

CUDA Version: NA

Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device cpu --prompt 'Explain what is a transformer'`

| Engine | float32 | float16 | int8 | int4 |
|-------------|--------------|--------------|--------------|--------------|
| burn | 0.30 Β± 0.09 | - | - | - |
| candle | - | 3.43 Β± 0.02 | - | - |
| llama.cpp | - | - | 14.41 Β± 1.59 | 20.96 Β± 1.94 |
| ctranslate | - | - | 2.11 Β± 0.73 | - |
| tinygrad | - | 4.21 Β± 0.38 | - | - |

#### GPU (Metal)

Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --prompt 'Explain what is a transformer'`

| Engine | float32 | float16 | int8 | int4 |
|-------------|--------------|--------------|--------------|--------------|
| burn | - | - | - | - |
| candle | - | - | - | - |
| llama.cpp | - | - | 31.24 Β± 7.82 | 46.75 Β± 9.55 |
| ctranslate | - | - | - | - |
| tinygrad | - | 29.78 Β± 1.18 | - | - |

*(data updated: <LAST_UPDATE>)
Loading

0 comments on commit e604256

Please sign in to comment.