Skip to content

Commit

Permalink
Fix broken unit test (#1743)
Browse files Browse the repository at this point in the history
Summary:
Error:

```
$ python run.py hf_T5_large -d cuda -t eval --accuracy
fp64 golden ref were not generated for hf_T5_large. Setting accuracy check to cosine
CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Traceback (most recent call last):
  File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 476, in check_accuracy
    correct_result = run_n_iterations(
  File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 375, in run_n_iterations
    _model_iter_fn(mod, inputs, contexts, optimizer, collect_outputs=False)
  File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 373, in _model_iter_fn
    forward_pass(mod, inputs, contexts, collect_outputs)
  File "/workspace/benchmark/torchbenchmark/util/env_check.py", line 354, in forward_pass
    return mod(*inputs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/benchmark/torchbenchmark/util/framework/huggingface/model_factory.py", line 46, in forward
    return self.model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1683, in forward
    encoder_outputs = self.encoder(
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1090, in forward
    layer_outputs = layer_module(
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 693, in forward
    self_attention_outputs = self.layer[0](
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 600, in forward
    attention_output = self.SelfAttention(
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 519, in forward
    query_states = shape(self.q(hidden_states))  # (batch_size, n_heads, seq_length, dim_per_head)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/runner/miniconda3/envs/torchbench/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
```

Still not sure what is the root cause, updating the batch size to "1" fixes it though.

Pull Request resolved: #1743

Reviewed By: davidberard98

Differential Revision: D46975729

Pulled By: xuzhao9

fbshipit-source-id: 80a367b2bd00e76ddaecfc62a7078baa14b4526a
  • Loading branch information
xuzhao9 authored and facebook-github-bot committed Jun 24, 2023
1 parent 5b2a70e commit 4e9d86e
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion torchbenchmark/models/hf_T5_large/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ not_implemented:
# hf_T5 model doesn't support JIT
- jit: true
# disable train test because of CI infra capacity issue
- test: train
- test: train
2 changes: 1 addition & 1 deletion torchbenchmark/util/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ def determine_batch_size(self, batch_size=None):
elif self.test == "eval" and (not self.batch_size == self.DEFAULT_EVAL_BSIZE):
raise NotImplementedError("Model doesn't support customizing batch size.")
elif self.dargs.accuracy:
self.batch_size = 4
self.batch_size = 4 if self.batch_size > 4 else self.batch_size

def load_metadata(self):
relative_path = self.__class__.__module__.split(".")
Expand Down

0 comments on commit 4e9d86e

Please sign in to comment.