Skip to content

Commit

Permalink
* added benchmarks on the AMD CPU
Browse files Browse the repository at this point in the history
  • Loading branch information
asofter committed Apr 12, 2024
1 parent 9b005b1 commit ca1ee2f
Show file tree
Hide file tree
Showing 15 changed files with 68 additions and 43 deletions.
1 change: 1 addition & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `NoRefusalLight` scanner that uses a common set of phrases to detect refusal as per research papers.
- `Anonymize` and `Sensitive` scanners have a support of [lakshyakh93/deberta_finetuned_pii](https://huggingface.co/lakshyakh93/deberta_finetuned_pii) model.
- `BanCode` scanner to detect and block code snippets in the prompt.
- Benchmarks on the AMD CPU.

### Fixed
- `InvisibleText` scanner to allow control characters like `\n`, `\t`, etc.
Expand Down
16 changes: 8 additions & 8 deletions docs/input_scanners/anonymize.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,11 @@ python benchmarks/run.py input Anonymize

Results:

| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|----------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|---------|
| AWS m5.xlarge | 6.11 | 255.64 | 294.57 | 325.71 | 177.13 | 1789.64 |
| AWS m5.xlarge with ONNX | 0.73 | 155.64 | 169.13 | 179.93 | 128.64 | 2464.29 |
| AWS g5.xlarge GPU | 38.50 | 321.59 | 419.60 | 498.01 | 125.18 | 2532.35 |
| AWS g5.xlarge GPU with ONNX | 1.04 | 70.49 | 86.47 | 99.26 | 38.11 | 8317.53 |
| Azure Standard_D4as_v4 | 48.72 | 487.29 | 597.19 | 685.10 | 265.64 | 1193.33 |
| Azure Standard_D4as_v4 with ONNX | 1.47 | 268.17 | 286.89 | 301.87 | 228.86 | 1385.13 |
| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|--------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|---------|
| AWS m5.xlarge | 6.11 | 255.64 | 294.57 | 325.71 | 177.13 | 1789.64 |
| AWS m5.xlarge with ONNX | 0.73 | 155.64 | 169.13 | 179.93 | 128.64 | 2464.29 |
| AWS g5.xlarge GPU | 38.50 | 321.59 | 419.60 | 498.01 | 125.18 | 2532.35 |
| AWS g5.xlarge GPU with ONNX | 1.04 | 70.49 | 86.47 | 99.26 | 38.11 | 8317.53 |
| AWS r6a.xlarge (AMD) | 0.45 | 266.44 | 276.45 | 284.47 | 244.17 | 1298.29 |
| AWS r6a.xlarge (AMD) with ONNX | 0.35 | 238.15 | 247.22 | 254.47 | 218.91 | 1448.06 |
5 changes: 4 additions & 1 deletion docs/input_scanners/ban_code.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,7 @@ python benchmarks/run.py input BanCode

Results:

WIP
| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|--------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|----------|
| AWS r6a.xlarge (AMD) | 0.00 | 23.37 | 23.97 | 24.45 | 21.71 | 11424.20 |
| AWS r6a.xlarge (AMD) with ONNX | 0.02 | 22.34 | 24.71 | 26.60 | 17.54 | 14142.09 |
9 changes: 5 additions & 4 deletions docs/input_scanners/ban_competitors.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ python benchmarks/run.py input BanCompetitors

Results:

| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|-------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|--------|
| AWS m5.xlarge | 2.85 | 616.51 | 642.39 | 663.09 | 561.55 | 149.59 |
| AWS g5.xlarge GPU | 26.72 | 274.92 | 356.44 | 421.66 | 111.01 | 756.69 |
| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|----------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|--------|
| AWS m5.xlarge | 2.85 | 616.51 | 642.39 | 663.09 | 561.55 | 149.59 |
| AWS g5.xlarge GPU | 26.72 | 274.92 | 356.44 | 421.66 | 111.01 | 756.69 |
| AWS r6a.xlarge (AMD) | 0.44 | 646.05 | 650.56 | 654.17 | 620.68 | 135.34 |
16 changes: 8 additions & 8 deletions docs/input_scanners/ban_topics.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,11 @@ python benchmarks/run.py input BanTopics

Results:

| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|----------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|---------|
| AWS m5.xlarge | 2.99 | 471.60 | 498.70 | 520.39 | 416.47 | 240.11 |
| AWS m5.xlarge with ONNX | 0.11 | 135.12 | 139.92 | 143.77 | 123.71 | 808.31 |
| AWS g5.xlarge GPU | 30.46 | 309.26 | 396.40 | 466.11 | 134.50 | 743.47 |
| AWS g5.xlarge GPU with ONNX | 0.13 | 33.88 | 39.43 | 43.87 | 22.38 | 4467.55 |
| Azure Standard_D4as_v4 | 4.00 | 518.30 | 547.49 | 570.85 | 450.78 | 221.84 |
| Azure Standard_D4as_v4 with ONNX | 0.02 | 135.58 | 136.72 | 137.63 | 131.06 | 763.04 |
| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|--------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|---------|
| AWS m5.xlarge | 2.99 | 471.60 | 498.70 | 520.39 | 416.47 | 240.11 |
| AWS m5.xlarge with ONNX | 0.11 | 135.12 | 139.92 | 143.77 | 123.71 | 808.31 |
| AWS g5.xlarge GPU | 30.46 | 309.26 | 396.40 | 466.11 | 134.50 | 743.47 |
| AWS g5.xlarge GPU with ONNX | 0.13 | 33.88 | 39.43 | 43.87 | 22.38 | 4467.55 |
| AWS r6a.xlarge (AMD) | 0.02 | 431.84 | 433.06 | 434.04 | 426.87 | 234.26 |
| AWS r6a.xlarge (AMD) with ONNX | 0.08 | 114.60 | 118.97 | 122.47 | 105.69 | 946.14 |
14 changes: 8 additions & 6 deletions docs/input_scanners/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,11 @@ python benchmarks/run.py input Code

Results:

| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|-----------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|----------|
| AWS m5.xlarge | 2.64 | 138.80 | 164.44 | 184.95 | 87.28 | 2841.37 |
| AWS m5.xlarge with ONNX | 0.00 | 59.06 | 59.40 | 59.68 | 58.07 | 4270.94 |
| AWS g5.xlarge GPU | 32.49 | 280.46 | 370.49 | 442.51 | 100.05 | 2478.86 |
| AWS g5.xlarge GPU with ONNX | 0.01 | 8.83 | 10.38 | 11.62 | 5.68 | 43654.48 |
| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|--------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|----------|
| AWS m5.xlarge | 2.64 | 138.80 | 164.44 | 184.95 | 87.28 | 2841.37 |
| AWS m5.xlarge with ONNX | 0.00 | 59.06 | 59.40 | 59.68 | 58.07 | 4270.94 |
| AWS g5.xlarge GPU | 32.49 | 280.46 | 370.49 | 442.51 | 100.05 | 2478.86 |
| AWS g5.xlarge GPU with ONNX | 0.01 | 8.83 | 10.38 | 11.62 | 5.68 | 43654.48 |
| AWS r6a.xlarge (AMD) | 0.00 | 64.58 | 65.47 | 66.18 | 62.60 | 3961.36 |
| AWS r6a.xlarge (AMD) with ONNX | 0.07 | 43.84 | 48.04 | 51.41 | 35.25 | 7034.54 |
5 changes: 4 additions & 1 deletion docs/input_scanners/gibberish.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,7 @@ python benchmarks/run.py input Gibberish

Results:

WIP
| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|--------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|---------|
| AWS r6a.xlarge (AMD) | 0.01 | 94.73 | 95.76 | 96.58 | 91.74 | 7161.76 |
| AWS r6a.xlarge (AMD) with ONNX | 0.07 | 87.77 | 91.84 | 95.10 | 79.40 | 8274.11 |
2 changes: 2 additions & 0 deletions docs/input_scanners/language.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,5 @@ Results:
| AWS g5.xlarge GPU with ONNX | 0.01 | 11.24 | 12.94 | 14.30 | 7.79 | 174817.81 |
| Azure Standard_D4as_v4 | 4.45 | 406.71 | 439.73 | 466.15 | 339.31 | 4014.05 |
| Azure Standard_D4as_v4 with ONNX | 0.01 | 288.10 | 289.15 | 289.99 | 285.00 | 4778.90 |
| AWS r6a.xlarge (AMD) | 0.01 | 326.16 | 327.72 | 328.97 | 322.43 | 4224.18 |
| AWS r6a.xlarge (AMD) with ONNX | 0.08 | 297.20 | 301.75 | 305.39 | 287.89 | 4731.04 |
2 changes: 2 additions & 0 deletions docs/input_scanners/prompt_injection.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,5 +83,7 @@ Results:
| AWS m5.xlarge with ONNX | 0.00 | 106.65 | 106.85 | 107.01 | 104.21 | 3684.92 |
| AWS g5.xlarge GPU | 17.00 | 211.63 | 276.70 | 328.76 | 81.01 | 4739.91 |
| AWS g5.xlarge GPU with ONNX | 0.01 | 11.44 | 13.28 | 14.75 | 7.65 | 50216.67 |
| AWS r6a.xlarge (AMD) | 0.02 | 209.49 | 211.40 | 212.92 | 205.05 | 1872.73 |
| AWS r6a.xlarge (AMD) with ONNX | 0.08 | 112.10 | 116.38 | 119.81 | 103.21 | 3720.40 |
| Azure Standard_D4as_v4 | 184.23 | 852.63 | 1066.26 | 1237.16 | 421.46 | 911.11 |
| Azure Standard_D4as_v4 with ONNX | 0.01 | 179.81 | 180.22 | 180.55 | 177.30 | 2165.87 |
2 changes: 2 additions & 0 deletions docs/input_scanners/toxicity.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,5 @@ Results:
| AWS g5.xlarge GPU with ONNX | 0.01 | 7.90 | 9.43 | 10.65 | 4.80 | 20221.31 |
| Azure Standard_D4as_v4 | 4.45 | 164.63 | 197.82 | 224.38 | 97.62 | 993.66 |
| Azure Standard_D4as_v4 with ONNX | 0.01 | 44.35 | 44.39 | 44.42 | 40.27 | 2408.71 |
| AWS r6a.xlarge (AMD) | 0.13 | 633.35 | 637.95 | 641.63 | 620.79 | 156.25 |
| AWS r6a.xlarge (AMD) with ONNX | 0.06 | 525.96 | 529.62 | 532.55 | 517.73 | 187.36 |
2 changes: 2 additions & 0 deletions docs/output_scanners/bias.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,5 @@ Results:
| AWS g5.xlarge GPU with ONNX | 0.01 | 6.69 | 8.22 | 9.45 | 3.59 | 35633.81 |
| Azure Standard_D4as_v4 | 3.91 | 126.54 | 157.68 | 182.60 | 63.81 | 2006.08 |
| Azure Standard_D4as_v4 with ONNX | 0.03 | 29.55 | 31.41 | 32.89 | 23.36 | 5479.92 |
| AWS r6a.xlarge (AMD) | 0.00 | 33.08 | 33.71 | 34.21 | 31.56 | 4055.29 |
| AWS r6a.xlarge (AMD) with ONNX | 0.07 | 37.63 | 41.64 | 44.85 | 29.52 | 4336.52 |
16 changes: 8 additions & 8 deletions docs/output_scanners/factual_consistency.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,11 @@ python benchmarks/run.py output FactualConsistency

Results:

| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|----------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|----------|
| AWS m5.xlarge | 3.01 | 234.94 | 262.31 | 284.20 | 180.00 | 777.78 |
| AWS m5.xlarge with ONNX | 0.09 | 98.62 | 103.28 | 107.01 | 89.00 | 1573.02 |
| AWS g5.xlarge GPU | 34.23 | 295.96 | 388.34 | 462.24 | 110.70 | 1264.69 |
| AWS g5.xlarge GPU with ONNX | 0.01 | 11.18 | 13.02 | 14.49 | 7.42 | 18879.18 |
| Azure Standard_D4as_v4 | 4.14 | 271.39 | 302.78 | 327.89 | 205.62 | 680.87 |
| Azure Standard_D4as_v4 with ONNX | 0.01 | 62.73 | 63.71 | 64.51 | 59.82 | 2340.44 |
| Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
|--------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|----------|
| AWS m5.xlarge | 3.01 | 234.94 | 262.31 | 284.20 | 180.00 | 777.78 |
| AWS m5.xlarge with ONNX | 0.09 | 98.62 | 103.28 | 107.01 | 89.00 | 1573.02 |
| AWS g5.xlarge GPU | 34.23 | 295.96 | 388.34 | 462.24 | 110.70 | 1264.69 |
| AWS g5.xlarge GPU with ONNX | 0.01 | 11.18 | 13.02 | 14.49 | 7.42 | 18879.18 |
| AWS r6a.xlarge (AMD) | 0.01 | 158.44 | 159.58 | 160.48 | 155.72 | 899.07 |
| AWS r6a.xlarge (AMD) with ONNX | 0.07 | 91.28 | 95.30 | 98.52 | 83.17 | 1683.27 |
2 changes: 2 additions & 0 deletions docs/output_scanners/malicious_urls.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,5 @@ Results:
| AWS g5.xlarge GPU with ONNX | 0.11 | 21.36 | 26.50 | 30.61 | 11.04 | 4620.81 |
| Azure Standard_D4as_v4 | 3.80 | 205.43 | 236.05 | 260.55 | 143.34 | 355.80 |
| Azure Standard_D4as_v4 with ONNX | 0.01 | 54.65 | 54.88 | 55.08 | 51.96 | 981.54 |
| AWS r6a.xlarge (AMD) | 0.00 | 87.10 | 87.70 | 88.19 | 84.73 | 601.90 |
| AWS r6a.xlarge (AMD) with ONNX | 0.07 | 43.17 | 47.26 | 50.54 | 34.89 | 1461.82 |
Loading

0 comments on commit ca1ee2f

Please sign in to comment.