yolov3: reduce batch size due to cudagraphs OOM #2008

xmfan · 2023-10-23T20:21:55Z

yolov3 w/ cudagraphs (known to use more memory) is failing perf test due to OOM (https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Mon,%2016%20Oct%202023%2020:19:47%20GMT&stopTime=Mon,%2023%20Oct%202023%2020:19:47%20GMT&granularity=hour&mode=training&dtype=amp&lBranch=main&lCommit=0b424ee0b7bfe09e0a438a63e8336e95eea85901&rBranch=main&rCommit=29048be41ca3aa8974795d93b9ea9fd6dee415fc)

I'm reducing the batch size from 16 to 8 to keep the same batch size for all yolov3 HUD benchmarks

xuzhao9

We got to be careful here because the original code repo uses 16 as the default batch size (https://github.com/ultralytics/yolov3/blob/master/train.py#L449), and we don't know how bs=8 will affect the E2E training accuracy.

I am leaning to keep the default batch size to be consistent with upstream, and only customize it in the dynamo runner or other userbenchmarks that test CUDAGraph.

xmfan · 2023-10-24T20:18:35Z

Moved the implementation to pytorch/benchmark/dynamo/torchbench.py: pytorch/pytorch#111959

xmfan requested a review from xuzhao9 October 23, 2023 20:21

xmfan had a problem deploying to docker-s3-upload October 23, 2023 20:22 — with GitHub Actions Error

facebook-github-bot added the cla signed label Oct 23, 2023

yolov3: reduce batch size due to OOM

c5fb66e

xmfan force-pushed the xmfan/yolov3_reduce_bs branch from fe396c3 to c5fb66e Compare October 23, 2023 20:24

xmfan temporarily deployed to docker-s3-upload October 23, 2023 20:24 — with GitHub Actions Inactive

xmfan marked this pull request as ready for review October 23, 2023 20:30

xuzhao9 reviewed Oct 24, 2023

View reviewed changes

xmfan closed this Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yolov3: reduce batch size due to cudagraphs OOM #2008

yolov3: reduce batch size due to cudagraphs OOM #2008

xmfan commented Oct 23, 2023

xuzhao9 left a comment

xmfan commented Oct 24, 2023

yolov3: reduce batch size due to cudagraphs OOM #2008

yolov3: reduce batch size due to cudagraphs OOM #2008

Conversation

xmfan commented Oct 23, 2023

xuzhao9 left a comment

Choose a reason for hiding this comment

xmfan commented Oct 24, 2023