You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Continue upgrading the FX/Dynamo aten pipelines to improve overall support for models and operator/module level customization using this path.
Goal(s)
The primary goal is to achieve robust support in the FX/Dynamo aten path and allow for customizations which can benefit overall performance and functionality of models compiled via that path. For now, the target is transformer-based models and acceleration of the complex operator structure required to implement them.
We are targeting two main compilation paths: torch.compile and torch._dynamo.export, both of which use the aten converters.
Proposed APIs / UX + Use-cases
The following compilation paths should be supported via the torch_tensorrt UI: 1. Robust converters
The aten path converters should be able to handle a variety of inputs and options, and should not error out in the case of incompatibility. Specifically, if a converter can handle torch.ops.aten.__some_operator__.default, it should be able to handle any reasonable argument inputs and have test cases to accompany.
2. Specifying torch_executed_modules when compiling via the aten path, which excludes certain modules from compilation
3. Custom specification of operators/callables which can encapsulate entire modules
Operators or modules in Torch should be replaceable with custom optimized versions thereof, which could be implemented in TensorRT or otherwise
4. Options for both JIT-compiled models (torch.compile) and export-requiring models (torch._dynamo.export)
Providing users a framework which has options for multiple usecases, including JIT compilation and exporting a compiled model for later use.
5. Support for Dynamic Shapes and Avoiding Recompilation for Dynamic Shapes in Dynamo
Add support for Dynamic shapes in both the compile and export paths. In the compile path, the model should not recompile if a new batch size is encountered (assuming the user has specified the dynamic dimension at compile time). Similarly, the export path should support dynamic dimensions in the same way TorchScript/FX do.
Limitations
This feature does not guarantee accelerated support of all operators, but it should preferably deliver clear messages and cause operators to fallback instead of throwing errors.
Internal Implementation
Design
Since this effort is a combination of different features/modifications, it will not need an extensive design change different from that which is already underway regarding the unification of FX/TS frontends (RFC #1372) and the effort to bootstrap the FX converters library (RFC #1557).
Extensions Required to Core API implementations
Substantial changes and improvements to torch_tensorrt.dynamo.compile and torch_tensorrt.dynamo.fx_ts_compat.compile as well as to torch_tensorrt.fx.*.
Summary: Will not accept models with data-dependent control flow, since they cannot be symbolically traced. Throws error to user when attempting a trace.
torch._dynamo.export
Summary: Does not support dynamic control flow, and occasionally circumvents it like torch.jit.trace. Will accept some models with data-dependent control flow, but not all.
Note: The returned graph is the result of propagating a Fake Tensor having dimension inp_if through the graph, thus the symbolic flow is essentially circumvented. This is similar to the behavior of torch.jit.trace (ignoring the control flow and picking a branch).
torch._dynamo.optimize(...) and torch.compile
Summary: Does support dynamic control flow. Recompiles the model when a new block is encountered based on control flow decisions (recompiling at inference time).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Improvements to Dynamo/TRT Workflow
TL;DR
Continue upgrading the FX/Dynamo
aten
pipelines to improve overall support for models and operator/module level customization using this path.Goal(s)
The primary goal is to achieve robust support in the FX/Dynamo
aten
path and allow for customizations which can benefit overall performance and functionality of models compiled via that path. For now, the target is transformer-based models and acceleration of the complex operator structure required to implement them.We are targeting two main compilation paths:
torch.compile
andtorch._dynamo.export
, both of which use theaten
converters.Proposed APIs / UX + Use-cases
The following compilation paths should be supported via the
torch_tensorrt
UI:1. Robust converters
The
aten
path converters should be able to handle a variety of inputs and options, and should not error out in the case of incompatibility. Specifically, if a converter can handletorch.ops.aten.__some_operator__.default
, it should be able to handle any reasonable argument inputs and have test cases to accompany.2. Specifying
torch_executed_modules
when compiling via theaten
path, which excludes certain modules from compilation3. Custom specification of operators/callables which can encapsulate entire modules
Operators or modules in Torch should be replaceable with custom optimized versions thereof, which could be implemented in TensorRT or otherwise
4. Options for both JIT-compiled models (
torch.compile
) and export-requiring models (torch._dynamo.export
)Providing users a framework which has options for multiple usecases, including JIT compilation and exporting a compiled model for later use.
5. Support for Dynamic Shapes and Avoiding Recompilation for Dynamic Shapes in Dynamo
Add support for Dynamic shapes in both the compile and export paths. In the compile path, the model should not recompile if a new batch size is encountered (assuming the user has specified the dynamic dimension at compile time). Similarly, the export path should support dynamic dimensions in the same way TorchScript/FX do.
Limitations
This feature does not guarantee accelerated support of all operators, but it should preferably deliver clear messages and cause operators to fallback instead of throwing errors.
Internal Implementation
Design
Since this effort is a combination of different features/modifications, it will not need an extensive design change different from that which is already underway regarding the unification of FX/TS frontends (RFC #1372) and the effort to bootstrap the FX converters library (RFC #1557).
Extensions Required to Core API implementations
Substantial changes and improvements to
torch_tensorrt.dynamo.compile
andtorch_tensorrt.dynamo.fx_ts_compat.compile
as well as totorch_tensorrt.fx.*
.Implementation Phases
Prototype - S (complete)
aten
path (🐛 [Bug] Transformers BERT Model does not compile via FX Path #1673, 🐛 [Bug] Transformers T5 Model does not compile via FX Path #1740, 🐛 [Bug] Transformers GPT2 Model does not compile via FX Path #1741)aten.matmul
in the FXaten
path #1709, ↔ [Converter] Add support foraten.gelu
andaten.tanh
in the FXaten
path #1713, ↔ [Converter] Add support for Tensor slicing and selection operators in the FXaten
path #1714, ↔ [Converter] Add support for Tensor reshape and permute operators in the FXaten
path #1724)MVP - M (complete)
aten
FX pathtorch.ops.aten
path, potentially 1:1 with availableaten::op
converters in TorchScript/C++Stable Dynamo - L (in progress ~35%)
aten
ops ( fx2trt converters - change of prototype and addition of activation operation #1745, ↔ [Converter] Add support for assorted operators in the FXaten
path #1769)torch_tensorrt.dynamo.compile
path #1941)torch_tensorrt.dynamo.compile
path #1941, 📖 [Story] Sharing Components between Torchexport
andcompile
Paths in Torch-TRT #1940)export
andcompile
Paths in Torch-TRT #1940)Extension Phase 1 - S
aten
converters implementedExtension Phase 2 - S/M
nn.MultiheadAttention
) via a custom tracer/block replacement to TRTBeta Was this translation helpful? Give feedback.
All reactions