You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Per discussion in: #2586 (comment), we should consider what sorts of flags / options the simplification and folding pass should have.
Something like kFoldOpEltLimit should probably be controlled from the callsite, since some users will not want to fold at the expense of compile time, and others will want the smallest graph possible. And there are likely many other options like this i.e. OnlyFoldSplats.
The text was updated successfully, but these errors were encountered:
I think a broader design on "optimization / folding control" is needed
Sounds good. During my initial exploration of the MLIR framework, ONNX, StableHLO, TOSA, etc., I expected to find fine-grained control over the optimizations performed, similar to classic compilers. However, later discovered that most of these tools have passes that either apply all optimizations or none at all.
... took 1min+ for folding / DCE ...
I guess it is normal for model compilation that will run in production? While having settings to limit optimizations would be beneficial, based on experience with compiling ResNet and BERT models, it can take considerable time to simply reduce the number of computations. However, this effort is definitely worthwhile for achieving good performance.
One operator I’ve found to be somewhat problematic to fold is broadcast_in_dim (ignore transpose semantics). In certain scenarios, not expanding it into a constant can significantly reduce memory traffic on the accelerator, e.g. stablehlo.add(tensor, splat) where splat is result of broadcast_in_dim operation.
Request description
Per discussion in: #2586 (comment), we should consider what sorts of flags / options the simplification and folding pass should have.
Something like
kFoldOpEltLimit
should probably be controlled from the callsite, since some users will not want to fold at the expense of compile time, and others will want the smallest graph possible. And there are likely many other options like this i.e.OnlyFoldSplats
.The text was updated successfully, but these errors were encountered: