Re-enable SGD (#117434)

Summary: Re-enables the SGD optimizer now that compile times are more reasonable. [Benchmark run](https://github.com/pytorch/pytorch/actions/runs/7511073761) X-link: pytorch/pytorch#117434 Approved by: https://github.com/anijain2305, https://github.com/janeyx99 Reviewed By: PaliC Differential Revision: D52863329 Pulled By: mlazos fbshipit-source-id: 6f73550531679c780cafe5eaf4afcb02d15d1237
pytorch · Jan 18, 2024 · 6b40341 · 6b40341
1 parent 19685da
commit 6b40341
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/userbenchmark/dynamo/dynamobench/common.py b/userbenchmark/dynamo/dynamobench/common.py
@@ -1982,6 +1982,12 @@ def init_optimizer(self, name, device, params):
         if device == "cuda" and self.args.training and name not in CI_SKIP_OPTIMIZER:
             if (name in CI_USE_SGD and self.args.ci) or name in BENCHMARK_USE_SGD:
                 self.optimizer = torch.optim.SGD(params, lr=0.01, foreach=True)
+                # Disable multi_tensor_sgd for benchmarking, there isn't a large performance benefit (~1%) to compiling
+                # this optimizer because it is a single foreach add, and increases compile time.
+                # After autotuning and fake tensor caching lands, we can enable, becuase the compile time impact will be lower.
+                # Fake Tensor caching: https://github.com/pytorch/pytorch/pull/113873
+                # Autotuning: https://github.com/pytorch/pytorch/issues/117447
+                self.optimizer.step = torch._dynamo.disable(self.optimizer.step)
             else:
                 self.optimizer = torch.optim.Adam(
                     params, lr=0.01, capturable=True, foreach=True