Fix broken unit test (#1743) · pytorch/benchmark@4e9d86e

Commit

Fix broken unit test (#1743)

Summary:
Error:

```
$ python run.py hf_T5_large -d cuda -t eval --accuracy
fp64 golden ref were not generated for hf_T5_large. Setting accuracy check to cosine
CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Traceback (most recent call last):
  File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 476, in check_accuracy
    correct_result = run_n_iterations(
  File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 375, in run_n_iterations
    _model_iter_fn(mod, inputs, contexts, optimizer, collect_outputs=False)
  File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 373, in _model_iter_fn
    forward_pass(mod, inputs, contexts, collect_outputs)
  File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 354, in forward_pass
    return mod(*inputs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/benchmark/torchbenchmark/util/framework/huggingface/model_factory.py", line 46, in forward
    return self.model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1683, in forward
    encoder_outputs = self.encoder(
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1090, in forward
    layer_outputs = layer_module(
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 693, in forward
    self_attention_outputs = self.layer[0](
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 600, in forward
    attention_output = self.SelfAttention(
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 519, in forward
    query_states = shape(self.q(hidden_states))  # (batch_size, n_heads, seq_length, dim_per_head)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
```

Still not sure what is the root cause, updating the batch size to "1" fixes it though.

Pull Request resolved: #1743

Reviewed By: davidberard98

Differential Revision: D46975729

Pulled By: xuzhao9

fbshipit-source-id: 80a367b2bd00e76ddaecfc62a7078baa14b4526a

Loading branch information

xuzhao9 authored and facebook-github-bot committed Jun 24, 2023

1 parent 5b2a70e commit 4e9d86e

torchbenchmark/models/hf_T5_large/metadata.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -7,4 +7,4 @@ not_implemented: @@
       # hf_T5 model doesn't support JIT
       - jit: true
       # disable train test because of CI infra capacity issue
-      - test: train
+      - test: train

torchbenchmark/util/model.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -180,7 +180,7 @@ def determine_batch_size(self, batch_size=None):
  
                elif self.test == "eval" and (not self.batch_size == self.DEFAULT_EVAL_BSIZE):

                    raise NotImplementedError("Model doesn't support customizing batch size.")

            elif self.dargs.accuracy:

                self.batch_size = 4

                self.batch_size = 4 if self.batch_size > 4 else self.batch_size

        def load_metadata(self):

            relative_path = self.__class__.__module__.split(".")

0 comments on commit `4e9d86e`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `4e9d86e`

Commit

There are no files selected for viewing

0 comments on commit 4e9d86e

0 comments on commit `4e9d86e`