merged master

premAI-io · Nov 23, 2023 · e604256 · e604256
2 parents 7197a71 + d1520a8
commit e604256
Show file tree

Hide file tree

Showing 57 changed files with 1,890 additions and 6,083 deletions.
diff --git a/.github/workflows/precommit.yaml b/.github/workflows/precommit.yaml
@@ -3,6 +3,8 @@ name: pre-commit
 on:
   pull_request:
     branches: [main]
+  push:
+    branches: [main]
 
 jobs:
   pre-commit:

diff --git a/.github/workflows/update_readme.yaml b/.github/workflows/update_readme.yaml
@@ -0,0 +1,30 @@
+name: Update README
+
+on:
+  push:
+    branches: ["main"]
+    paths:
+      - README.md.template
+
+jobs:
+  update-readme:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout Code Repository
+        uses: actions/checkout@v3
+
+      - name: Update README
+        run: sed "s|<LAST_UPDATE>|$(date -u +"%dth %B %Y")|g" README.md.template > README.md
+
+      - name: Commit changes
+        run: |
+          git config --global user.email "actions@github.com"
+          git config --global user.name "GitHub Actions"
+          git add README.md
+          git commit -m "Update <LAST_UPDATE> placeholder in README.md" || true
+
+      - name: Push changes
+        uses: ad-m/github-push-action@master
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          branch: ${{ github.ref }}
diff --git a/.gitignore b/.gitignore
@@ -162,3 +162,7 @@ cython_debug/
 # don't check-in sub folder
 models/*
 !models/.gitkeep
+
+# Repositories
+bench_tinygrad/tinygrad
+bench_burn/llama2-burn
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -31,6 +31,11 @@ repos:
         args: ["--config=setup.cfg"]
         additional_dependencies: [flake8-isort]
 
+  - repo: https://github.com/shellcheck-py/shellcheck-py
+    rev: v0.9.0.6
+    hooks:
+    -   id: shellcheck
+
 ci:
   autoupdate_schedule: weekly
   skip: []

diff --git a/README.md b/README.md
@@ -1,47 +1,46 @@
 # benchmarks
 MLOps Engines, Frameworks, and Languages benchmarks over main stream AI Models.
 
-## Tool
+## Structure
 
-The benchmarking tool comprises three main scripts:
-- `benchmark.sh` for running the end-to-end benchmarking
-- `download.sh` which is internally used by the benchmark script to download the needed model files based on a configuration
-- `setup.sh` script for setup of dependencies and needed formats conversion
+The repository is organized to facilitate benchmark management and execution through a consistent structure:
 
-### benchmark
+- Each benchmark, identified as `bench_name`, has a dedicated folder, `bench_{bench_name}`.
+- Within these benchmark folders, a common script named `bench.sh` handles setup, environment configuration, and execution.
 
-This script runs benchmarks for a transformer model using both Rust and Python implementations. It provides options to customize the benchmarks, such as the prompt, repetitions, maximum tokens, device, and NVIDIA flag.
+### Benchmark Script
 
-```bash
-./benchmark.sh [OPTIONS]
-```
-where `OPTIONS`:
-- `-p, --prompt`: Prompt for benchmarks (default: 'Explain what is a transformer')
-- `-r, --repetitions`: Number of repetitions for benchmarks (default: 2)
-- `-m, --max_tokens`: Maximum number of tokens for benchmarks (default: 100)
-- `-d, --device`: Device for benchmarks (possible values: 'gpu' or 'cpu', default: 'cpu')
-- `--nvidia`: Use NVIDIA for benchmarks (default: false)
+The `bench.sh` script supports key parameters:
 
-### download
+- `prompt`: Benchmark-specific prompt.
+- `max_tokens`: Maximum tokens for the benchmark.
+- `repetitions`: Number of benchmark repetitions.
+- `log_file`: File for storing benchmark logs.
+- `device`: Device for benchmark execution (cpu, cuda, metal).
+- `models_dir`: Directory containing necessary model files.
 
-Downloads files from a list of URLs specified in a JSON file. The JSON file should contain an array of objects, each with a 'url', 'file', and 'folder' property. The script checks if the file already exists before downloading it.
+### Unified Execution
 
-```bash
-./download.sh --models <json_file> --cache <cache_file> --force-download
-```
-Options
-- `--models`: JSON file specifying the models to download (default: models.json)
-- `--cache`: Cache file to keep track of downloaded files (default: cache.log)
-- `--force-download`: Force download of all files, removing existing files and cache
+An overarching `bench.sh` script streamlines benchmark execution:
+
+- Downloads essential files for benchmarking.
+- Iterates through all benchmark folders in the repository.
+
+This empowers users to seamlessly execute benchmarks based on their preference. To run a specific benchmark, navigate to the corresponding benchmark folder (e.g., `bench_{bench_name}`) and execute the `bench.sh` script with the required parameters.
 
-### setup
-1. Creates a python virtual environment `venv` and installs project requirements.
-3. Converts and stores models in different formats.
+
+
+## Usage
 
 ```bash
-./setup.sh
+# Run a specific benchmark
+./bench_{bench_name}/bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models>
+
+# Run all benchmarks collectively
+./bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models>
 ```
 
+
 ## ML Engines: Feature Table
 
 | Features                    | pytorch | burn | llama.cpp | candle | tinygrad | onnxruntime | CTranslate2 |
@@ -74,16 +73,15 @@ CUDA Version: 11.7
 
 Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --nvidia --prompt 'Explain what is a transformer'`
 
-| Engine      | float32      | float16      | int8         | int4         |
-|-------------|--------------|--------------|--------------|--------------|
-| burn        | 13.28 ± 0.79 |      -       |      -       |      -       |
-| candle      |      -       | 26.30 ± 0.29 |      -       |      -       |
-| llama.cpp   |      -       |      -       | 67.64 ± 22.57| 106.21 ± 2.21|
-| ctranslate  |      -       | 58.54 ± 13.24| 34.22 ± 6.29 |      -       |
-| tinygrad    |      -       | 20.13 ± 1.35 |      -       |      -       |
-| onnx        |      -       | 50.50 ± 3.58 |      -       |      -       |
+| Engine      | float32      | float16       | int8          | int4          |
+|-------------|--------------|---------------|---------------|---------------|
+| burn        | 13.12 ± 0.85 |      -        |      -        |      -        |
+| candle      |      -       | 36.78 ± 2.17  |      -        |      -        |
+| llama.cpp   |      -       |      -        | 84.48 ± 3.76  | 106.76 ± 1.29 |
+| ctranslate  |      -       | 51.38 ± 16.01 | 36.12 ± 11.93 |      -        |
+| tinygrad    |      -       | 20.32 ± 0.06  |      -        |      -        |
 
-*(data updated: 17th November 2023)
+*(data updated: 23th November 2023)
 
 
 ### M2 MAX 32GB Inference Bench:
@@ -116,4 +114,4 @@ Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --prompt
 | ctranslate  |      -       |      -       |      -       |      -       |
 | tinygrad    |      -       | 29.78 ± 1.18 |      -       |      -       |
 
-*(data updated: 15th November 2023)
+*(data updated: 23th November 2023)
diff --git a/README.md.template b/README.md.template
@@ -0,0 +1,117 @@
+# benchmarks
+MLOps Engines, Frameworks, and Languages benchmarks over main stream AI Models.
+
+## Structure
+
+The repository is organized to facilitate benchmark management and execution through a consistent structure:
+
+- Each benchmark, identified as `bench_name`, has a dedicated folder, `bench_{bench_name}`.
+- Within these benchmark folders, a common script named `bench.sh` handles setup, environment configuration, and execution.
+
+### Benchmark Script
+
+The `bench.sh` script supports key parameters:
+
+- `prompt`: Benchmark-specific prompt.
+- `max_tokens`: Maximum tokens for the benchmark.
+- `repetitions`: Number of benchmark repetitions.
+- `log_file`: File for storing benchmark logs.
+- `device`: Device for benchmark execution (cpu, cuda, metal).
+- `models_dir`: Directory containing necessary model files.
+
+### Unified Execution
+
+An overarching `bench.sh` script streamlines benchmark execution:
+
+- Downloads essential files for benchmarking.
+- Iterates through all benchmark folders in the repository.
+
+This empowers users to seamlessly execute benchmarks based on their preference. To run a specific benchmark, navigate to the corresponding benchmark folder (e.g., `bench_{bench_name}`) and execute the `bench.sh` script with the required parameters.
+
+
+
+## Usage
+
+```bash
+# Run a specific benchmark
+./bench_{bench_name}/bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models>
+
+# Run all benchmarks collectively
+./bench.sh --prompt <value> --max_tokens <value> --num_repetitions <value> --log_file <file_path> --device <cpu/cuda/metal> --models_dir <path_to_models>
+```
+
+
+## ML Engines: Feature Table
+
+| Features                    | pytorch | burn | llama.cpp | candle | tinygrad | onnxruntime | CTranslate2 |
+| --------------------------- | ------- | ---- | --------- | ------ | -------- | ----------- | ----------- |
+| Inference support           | ✅      | ✅   | ✅        | ✅     | ✅       | ✅          | ✅          |
+| 16-bit quantization support | ✅      | ✅   | ✅        | ✅     | ✅       | ✅          | ✅          |
+| 8-bit quantization support  | ✅      | ❌   | ✅        | ✅     | ✅       | ✅          | ✅          |
+| 4-bit quantization support  | ✅      | ❌   | ✅        | ✅     | ❌       | ❌          | ❌          |
+| 2/3bit quantization support | ✅      | ❌   | ✅        | ✅     | ❌       | ❌          | ❌          |
+| CUDA support                | ✅      | ✅   | ✅        | ✅     | ✅       | ✅          | ✅          |
+| ROCM support                | ✅      | ✅   | ✅        | ✅     | ✅       | ❌          | ❌          |
+| Intel OneAPI/SYCL support   | ✅**    | ✅   | ✅        | ✅     | ✅       | ❌          | ❌          |
+| Mac M1/M2 support           | ✅      | ✅   | ✅        | ⭐     | ✅       | ✅          | ⭐          |
+| BLAS support(CPU)           | ✅      | ✅   | ✅        | ✅     | ❌       | ✅          | ✅          |
+| Model Parallel support      | ✅      | ❌   | ❌        | ✅     | ❌       | ❌          | ✅          |
+| Tensor Parallel support     | ✅      | ❌   | ❌        | ✅     | ❌       | ❌          | ✅          |
+| Onnx Format support         | ✅      | ✅   | ✅        | ✅     | ✅       | ✅          | ❌          |
+| Training support            | ✅      | 🌟   | ❌        | 🌟     | ❌       | ❌          | ❌          |
+
+⭐ = No Metal Support
+🌟 = Partial Support for Training (Finetuning already works, but training from scratch may not work)
+
+## Benchmarking ML Engines
+
+### A100 80GB Inference Bench:
+
+Model: LLAMA-2-7B
+
+CUDA Version: 11.7
+
+Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --nvidia --prompt 'Explain what is a transformer'`
+
+| Engine      | float32      | float16       | int8          | int4          |
+|-------------|--------------|---------------|---------------|---------------|
+| burn        | 13.12 ± 0.85 |      -        |      -        |      -        |
+| candle      |      -       | 36.78 ± 2.17  |      -        |      -        |
+| llama.cpp   |      -       |      -        | 84.48 ± 3.76  | 106.76 ± 1.29 |
+| ctranslate  |      -       | 51.38 ± 16.01 | 36.12 ± 11.93 |      -        |
+| tinygrad    |      -       | 20.32 ± 0.06  |      -        |      -        |
+
+*(data updated: <LAST_UPDATE>)
+
+
+### M2 MAX 32GB Inference Bench:
+
+#### CPU
+
+Model: LLAMA-2-7B
+
+CUDA Version: NA
+
+Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device cpu --prompt 'Explain what is a transformer'`
+
+| Engine      | float32       | float16       | int8         | int4         |
+|-------------|--------------|--------------|--------------|--------------|
+| burn        | 0.30 ± 0.09  |      -       |      -       |      -       |
+| candle      |      -       | 3.43 ± 0.02  |      -       |      -       |
+| llama.cpp   |      -       |      -       | 14.41 ± 1.59 | 20.96 ± 1.94 |
+| ctranslate  |      -       |      -       | 2.11 ± 0.73  |      -       |
+| tinygrad    |      -       | 4.21 ± 0.38  |      -       |      -       |
+
+#### GPU (Metal)
+
+Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --prompt 'Explain what is a transformer'`
+
+| Engine      | float32       | float16       | int8         | int4         |
+|-------------|--------------|--------------|--------------|--------------|
+| burn        |      -       |      -       |      -       |      -       |
+| candle      |      -       |      -       |      -       |      -       |
+| llama.cpp   |      -       |      -       | 31.24 ± 7.82 | 46.75 ± 9.55 |
+| ctranslate  |      -       |      -       |      -       |      -       |
+| tinygrad    |      -       | 29.78 ± 1.18 |      -       |      -       |
+
+*(data updated: <LAST_UPDATE>)