Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends #2058

voznesenskym · 2023-11-25T03:37:50Z

The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of a lot of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

We cache source->symbol in shape_env
We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, pytorch/pytorch#113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:

the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this)

cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

imported-using-ghimport

Reviewed By: huydhn, Chillee

Differential Revision: D51566250

Pulled By: voznesenskym

facebook-github-bot · 2023-11-25T03:38:06Z

This pull request was exported from Phabricator. Differential Revision: D51566250

…amo backends (pytorch#2058) Summary: X-link: pytorch/pytorch#114526 X-link: pytorch/pytorch#113926 The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, pytorch/pytorch#113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym

…amo backends (#114526) Summary: X-link: pytorch/benchmark#2058 The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, #113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym

facebook-github-bot · 2023-11-25T06:09:12Z

This pull request was exported from Phabricator. Differential Revision: D51566250

…amo backends (pytorch#2058) Summary: X-link: pytorch/pytorch#113926 The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, pytorch/pytorch#113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym

…amo backends (#113926) Summary: X-link: pytorch/benchmark#2058 The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, #113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym

facebook-github-bot · 2023-11-25T19:10:44Z

This pull request was exported from Phabricator. Differential Revision: D51566250

facebook-github-bot · 2023-11-26T07:04:05Z

This pull request was exported from Phabricator. Differential Revision: D51566250

facebook-github-bot added the cla signed label Nov 25, 2023

voznesenskym temporarily deployed to docker-s3-upload November 25, 2023 03:38 — with GitHub Actions Inactive

voznesenskym had a problem deploying to docker-s3-upload November 25, 2023 03:38 — with GitHub Actions Failure

facebook-github-bot added the fb-exported label Nov 25, 2023

voznesenskym force-pushed the export-D51566250 branch from 7a863d3 to efcf520 Compare November 25, 2023 06:08

voznesenskym temporarily deployed to docker-s3-upload November 25, 2023 06:09 — with GitHub Actions Inactive

voznesenskym had a problem deploying to docker-s3-upload November 25, 2023 06:09 — with GitHub Actions Failure

voznesenskym force-pushed the export-D51566250 branch from efcf520 to d30b744 Compare November 25, 2023 19:10

voznesenskym had a problem deploying to docker-s3-upload November 25, 2023 19:10 — with GitHub Actions Failure

voznesenskym temporarily deployed to docker-s3-upload November 25, 2023 19:10 — with GitHub Actions Inactive

voznesenskym closed this Nov 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends #2058

Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends #2058

voznesenskym commented Nov 25, 2023

facebook-github-bot commented Nov 25, 2023

facebook-github-bot commented Nov 25, 2023

facebook-github-bot commented Nov 25, 2023

facebook-github-bot commented Nov 26, 2023

Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends #2058

Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends #2058

Conversation

voznesenskym commented Nov 25, 2023

facebook-github-bot commented Nov 25, 2023

facebook-github-bot commented Nov 25, 2023

facebook-github-bot commented Nov 25, 2023

facebook-github-bot commented Nov 26, 2023