Skip to content

Commit

Permalink
add: sacrebleu benchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
shenxiangzhuang committed Apr 23, 2024
1 parent c6167a5 commit d03b522
Show file tree
Hide file tree
Showing 6 changed files with 27 additions and 3 deletions.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,13 @@ print(results)
## Benchmark

### Simple
We use the demo data shown in quick start to do this simple benchmark.
You can check the [benchmark/simple](./benchmark/simple) for the benchmark source code.

[//]: # (https://app.warp.dev/block/Mt8BOS3rllMuryMkcI4Gr5)
![img.png](asset/benchmark/simple.png)


Note that bleuscore gets same result with huggingface evaluate, but sacrebleu gets different result.
(The reason maybe related to the implementation details in sacrebleu)

Binary file modified asset/benchmark/simple.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 4 additions & 1 deletion benchmark/bench.sh
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
hyperfine --warmup 5 --runs 100 "python simple/rs_bleuscore.py" "python simple/hf_evaluate.py"
hyperfine --warmup 5 --runs 100 \
"python simple/rs_bleuscore.py" \
"python simple/sacre_bleu.py" \
"python simple/hf_evaluate.py"
11 changes: 11 additions & 0 deletions benchmark/simple/sacre_bleu.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from sacrebleu.metrics import BLEU


predictions = ["hello there general kenobi", "foo bar foobar"]
references = [
["hello there general kenobi", "hello there !"],
["foo bar foobar"]
]

bleu = BLEU(smooth_method="none", max_ngram_order=4, tokenize='13a')
results = bleu.corpus_score(predictions, references)
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Homepage = 'https://github.com/shenxiangzhuang/bleuscore'
Source = 'https://github.com/shenxiangzhuang/bleuscore'

[project.optional-dependencies]
test = ["pytest", "pytest-sugar", "hypothesis", "evaluate"]
test = ["pytest", "pytest-sugar", "hypothesis", "evaluate", "sacrebleu"]
lint = ["black", "ruff~=0.3.7"]
#docs = []
#dev = []
Expand Down
3 changes: 2 additions & 1 deletion src/bleu.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ pub struct BleuScore {
pub reference_length: usize,
}

/// compute the BLEU score with `Tokenizer13a` as default tokenizer
/// compute the BLEU score with `Tokenizer13a` as default tokenizer.
/// The implementation is based on [huggingface/nmt](https://github.com/huggingface/evaluate/blob/main/metrics/bleu/bleu.py)
pub fn compute_score(
references: Vec<Vec<String>>,
predictions: Vec<String>,
Expand Down

0 comments on commit d03b522

Please sign in to comment.