Skip to content

Commit

Permalink
Enable in oss (#124031)
Browse files Browse the repository at this point in the history
Summary:
Biggest movement is 4% HF inference, 9% TIMM inference. Note, this is max-autotune mode so we are more tolerant of compilation increases. We could improve compilation time by limiting:
```
# Take how many of the top triton kernels to benchmark epilogue
max_epilogue_benchmarked_choices = 3
```

There is a hf_Whisper failure which you can repro on main without this stack with `TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_BACKENDS=TRITON TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --backend inductor --amp --accuracy --training --only hf_Whisper`. When you turn off epilogue fusion, it fixes the accuracy. I bisected the failure to an epilogue, however when you compare the results of that epilogue with the corresponding separate kernels the results of the output are equivalent.

Inference:

<img width="1686" alt="image" src="https://github.com/pytorch/pytorch/assets/11477974/0b240080-cd33-4c08-89d3-583103b1fb0c">

Training:

<img width="1329" alt="Screenshot 2024-04-16 at 6 16 30 PM" src="https://github.com/pytorch/pytorch/assets/11477974/db0afcc9-7288-4c27-84ce-4fc1a5690788">

X-link: pytorch/pytorch#124031
Approved by: https://github.com/Chillee, https://github.com/shunting314
ghstack dependencies: #124030, #122642, #123229, #122825

Reviewed By: jeanschmidt

Differential Revision: D56379580

Pulled By: eellison

fbshipit-source-id: 8e11d1636a2f48bb8c8d0380dad3a2ac76294422
  • Loading branch information
eellison authored and facebook-github-bot committed Apr 22, 2024
1 parent d9d9337 commit eae910e
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 0 deletions.
8 changes: 8 additions & 0 deletions userbenchmark/dynamo/dynamobench/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -2578,6 +2578,14 @@ def record_status(accuracy_status, dynamo_start_stats):
# E.g., the output order might not match, None might be part of output, etc.

try:
if self.args.training and self.args.amp:
if process_fn := self.get_output_amp_train_process_func.get(
name, None
):
correct_result = process_fn(correct_result)
new_result = process_fn(new_result)
fp64_outputs = process_fn(fp64_outputs)

if not same(
correct_result,
new_result,
Expand Down
28 changes: 28 additions & 0 deletions userbenchmark/dynamo/dynamobench/torchbench.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,30 @@ def maybe_list_to_set(obj):
return maybe_list_to_set(data)


def process_hf_reformer_output(out):
assert isinstance(out, list)
# second output is unstable
return [elem for i, elem in enumerate(out) if i != 1]


def process_hf_whisper_output(out):
out_ret = []
for i, elem in enumerate(out):
if i == 0:
assert isinstance(elem, dict)
out_ret.append({k: v for k, v in elem.items() if k != "logits"})
elif i != 1:
out_ret.append(elem)

return out_ret


process_train_model_output = {
"hf_Reformer": process_hf_reformer_output,
"hf_Whisper": process_hf_whisper_output,
}


class TorchBenchmarkRunner(BenchmarkRunner):
def __init__(self):
super().__init__()
Expand Down Expand Up @@ -142,6 +166,10 @@ def very_slow_models(self):
def non_deterministic_models(self):
return self._config["non_deterministic"]

@property
def get_output_amp_train_process_func(self):
return process_train_model_output

@property
def skip_not_suitable_for_training_models(self):
return self._skip["test"]["training"]
Expand Down

0 comments on commit eae910e

Please sign in to comment.