FIX SAM for bfloat16 #1764

msaroufim · 2023-07-13T01:58:24Z

Ok this was kinda annoying

Basically the SAM codebase had a few places where it hardcodes torch.float32 such that even if you convert the model to torch.bfloat16 a few parts of the model won't be and will have type mismatch errors - this fixes the problem @cpuhrsch @desertfire - idk enough about floats and why there isn't some type promotion rule for bfloat16

I wonder whether we should add tests for multiple dtypes in torchbench to make checking for this kind of issue more robust especially now that bfloat16 seems to be the default for dynamo @xuzhao9

Logs

FAILED (errors=1)
(sam) ubuntu@ip-172-31-9-217:~/benchmark$ python test.py -k "test_sam_eval_cuda"
E
======================================================================
ERROR: test_sam_eval_cuda (__main__.TestBenchmark)
----------------------------------------------------------------------
components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last):
  File "/home/ubuntu/benchmark/components/_impl/workers/subprocess_rpc.py", line 482, in _run_block
    exec(  # noqa: P204
  File "<subprocess-worker>", line 2, in <module>
  File "/home/ubuntu/benchmark/torchbenchmark/util/model.py", line 280, in invoke
    out = self.eval()
  File "/home/ubuntu/benchmark/torchbenchmark/models/sam/__init__.py", line 65, in eval
    masks, scores, logits = predictor.predict(
  File "/home/ubuntu/benchmark/torchbenchmark/models/sam/predictor.py", line 164, in predict
    low_res_masks_np = low_res_masks[0].detach().cpu().numpy()
TypeError: Got unsupported ScalarType BFloat16

    working_dir: /tmp/tmpg5de41du
    stdout:
        [2023-07-13] 01:57:38.499061: TIMER_SUBPROCESS_BEGIN_EXEC
        [2023-07-13] 01:57:39.002078: TIMER_SUBPROCESS_FAILED
        [2023-07-13] 01:57:39.002141: TIMER_SUBPROCESS_FINISHED
        [2023-07-13] 01:57:39.002153: TIMER_SUBPROCESS_BEGIN_READ

    stderr:


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/benchmark/test.py", line 104, in eval_fn
    task.invoke()
  File "/home/ubuntu/benchmark/torchbenchmark/__init__.py", line 402, in invoke
    self.worker.run("""
  File "/home/ubuntu/benchmark/components/_impl/workers/subprocess_worker.py", line 155, in run
    self._run(snippet)
  File "/home/ubuntu/benchmark/components/_impl/workers/subprocess_worker.py", line 320, in _run
    subprocess_rpc.SerializedException.raise_from(
  File "/home/ubuntu/benchmark/components/_impl/workers/subprocess_rpc.py", line 458, in raise_from
    raise e from ChildTraceException(traceback_str)
TypeError: Got unsupported ScalarType BFloat16

----------------------------------------------------------------------
Ran 1 test in 7.814s

FAILED (errors=1)
(sam) ubuntu@ip-172-31-9-217:~/benchmark$ python test.py -k "test_sam_eval_cuda"
.
----------------------------------------------------------------------
Ran 1 test in 8.315s

OK

facebook-github-bot · 2023-07-13T16:38:44Z

@msaroufim has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-07-14T10:25:47Z

@msaroufim merged this pull request in 745644f.

FIX SAM for bfloat16

fcaf110

facebook-github-bot added the cla signed label Jul 13, 2023

push

e26af38

cpuhrsch approved these changes Jul 13, 2023

View reviewed changes

facebook-github-bot closed this in 745644f Jul 14, 2023

facebook-github-bot added the Merged label Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX SAM for bfloat16 #1764

FIX SAM for bfloat16 #1764

msaroufim commented Jul 13, 2023 •

edited

Loading

facebook-github-bot commented Jul 13, 2023

facebook-github-bot commented Jul 14, 2023

FIX SAM for bfloat16 #1764

FIX SAM for bfloat16 #1764

Conversation

msaroufim commented Jul 13, 2023 • edited Loading

Logs

facebook-github-bot commented Jul 13, 2023

facebook-github-bot commented Jul 14, 2023

msaroufim commented Jul 13, 2023 •

edited

Loading