Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better control of simplification and folding patterns via Pass Options #2588

Open
GleasonK opened this issue Oct 14, 2024 · 1 comment
Open

Comments

@GleasonK
Copy link
Member

GleasonK commented Oct 14, 2024

Request description

Per discussion in: #2586 (comment), we should consider what sorts of flags / options the simplification and folding pass should have.

Something like kFoldOpEltLimit should probably be controlled from the callsite, since some users will not want to fold at the expense of compile time, and others will want the smallest graph possible. And there are likely many other options like this i.e. OnlyFoldSplats.

@GleasonK
Copy link
Member Author

from @mvpant

I think a broader design on "optimization / folding control" is needed

Sounds good. During my initial exploration of the MLIR framework, ONNX, StableHLO, TOSA, etc., I expected to find fine-grained control over the optimizations performed, similar to classic compilers. However, later discovered that most of these tools have passes that either apply all optimizations or none at all.

... took 1min+ for folding / DCE ...

I guess it is normal for model compilation that will run in production? While having settings to limit optimizations would be beneficial, based on experience with compiling ResNet and BERT models, it can take considerable time to simply reduce the number of computations. However, this effort is definitely worthwhile for achieving good performance.

One operator I’ve found to be somewhat problematic to fold is broadcast_in_dim (ignore transpose semantics). In certain scenarios, not expanding it into a constant can significantly reduce memory traffic on the accelerator, e.g. stablehlo.add(tensor, splat) where splat is result of broadcast_in_dim operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant