[userbenchmark] Broader error catching in Torch-TRT `userbenchmark` #1974

gs-olive · 2023-10-09T19:30:26Z

Failures on a recent sample run indicate that errors raised by the subprocess are not always Exceptions, but are still sometimes recoverable
Add except clause to catch all such errors, short of keyboard interrupts, so the compilation can complete despite such errors

- Failures on a recent sample run indicate that errors raised by the subprocess are not always Exceptions, but are still sometimes recoverable - Add `except` clause to catch all such errors, short of keyboard interrupts, so the compilation can complete despite such errors

xuzhao9

LGTM

facebook-github-bot · 2023-10-10T13:38:21Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-10-10T14:55:18Z

@xuzhao9 merged this pull request in eb8f952.

gs-olive · 2023-10-10T22:57:10Z

@xuzhao9 - thanks! If possible, could you start another benchmark run with this new change?

gs-olive · 2023-10-18T21:32:27Z

Hi @xuzhao9 - I am wondering how we might be able to set up a weekly CI run of this userbenchmark? I noticed some of the other benchmarks have a .yaml. Would something like ci.yaml, with the following work?

platform:   "gcp_a100"
schedule:   "weekly"

xuzhao9 · 2023-10-18T21:58:41Z

@gs-olive Currently, it is already running nightly: https://github.com/pytorch/benchmark/blob/main/userbenchmark/torch_trt/ci.yaml#L2

Changing it to weekly will work ootb - but we need to fix the existing CI errors first: https://github.com/pytorch/benchmark/actions/runs/6518636322/job/17704274032

gs-olive · 2023-10-18T23:07:45Z

I see - thanks for sharing this, it is likely because I am saving the error message as a string in the JSON dictionary when compilation or model building fails. Should I not save an entry at all in such situations instead?

xuzhao9 · 2023-10-18T23:51:08Z

I see - thanks for sharing this, it is likely because I am saving the error message as a string in the JSON dictionary when compilation or model building fails. Should I not save an entry at all in such situations instead?

Yes, the metric value field in the JSON dictionary accepts float only. We suggest using a separate file or multiple separate files to save the complete error message for readability. In the JSON dictionary, if a metric fails, we suggest to use a special float value (e.g., -1.0) to indicate an error or skip having this metric Id.

gs-olive · 2023-10-19T22:56:37Z

Thanks for the information! I have added the necessary changes to fix the string-values in the dictionary, here: #1998

facebook-github-bot added the cla signed label Oct 9, 2023

gs-olive had a problem deploying to docker-s3-upload October 9, 2023 19:30 — with GitHub Actions Error

gs-olive had a problem deploying to docker-s3-upload October 9, 2023 19:31 — with GitHub Actions Error

gs-olive force-pushed the torch_trt_error_catch_fix branch from cc34f3b to a609231 Compare October 9, 2023 19:37

gs-olive temporarily deployed to docker-s3-upload October 9, 2023 19:38 — with GitHub Actions Inactive

gs-olive temporarily deployed to docker-s3-upload October 9, 2023 19:39 — with GitHub Actions Inactive

xuzhao9 approved these changes Oct 10, 2023

View reviewed changes

facebook-github-bot closed this in eb8f952 Oct 10, 2023

facebook-github-bot added the Merged label Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[userbenchmark] Broader error catching in Torch-TRT `userbenchmark` #1974

[userbenchmark] Broader error catching in Torch-TRT `userbenchmark` #1974

gs-olive commented Oct 9, 2023

xuzhao9 left a comment

facebook-github-bot commented Oct 10, 2023

facebook-github-bot commented Oct 10, 2023

gs-olive commented Oct 10, 2023

gs-olive commented Oct 18, 2023

xuzhao9 commented Oct 18, 2023 •

edited

Loading

gs-olive commented Oct 18, 2023

xuzhao9 commented Oct 18, 2023 •

edited

Loading

gs-olive commented Oct 19, 2023

[userbenchmark] Broader error catching in Torch-TRT userbenchmark #1974

[userbenchmark] Broader error catching in Torch-TRT userbenchmark #1974

Conversation

gs-olive commented Oct 9, 2023

xuzhao9 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 10, 2023

facebook-github-bot commented Oct 10, 2023

gs-olive commented Oct 10, 2023

gs-olive commented Oct 18, 2023

xuzhao9 commented Oct 18, 2023 • edited Loading

gs-olive commented Oct 18, 2023

xuzhao9 commented Oct 18, 2023 • edited Loading

gs-olive commented Oct 19, 2023

[userbenchmark] Broader error catching in Torch-TRT `userbenchmark` #1974

[userbenchmark] Broader error catching in Torch-TRT `userbenchmark` #1974

xuzhao9 commented Oct 18, 2023 •

edited

Loading

xuzhao9 commented Oct 18, 2023 •

edited

Loading