Improvements to Dynamo/TRT Workflow #1743

gs-olive · 2023-03-17T03:17:16Z

gs-olive
Mar 17, 2023
Collaborator

Improvements to Dynamo/TRT Workflow

TL;DR

Continue upgrading the FX/Dynamo aten pipelines to improve overall support for models and operator/module level customization using this path.

Goal(s)

The primary goal is to achieve robust support in the FX/Dynamo aten path and allow for customizations which can benefit overall performance and functionality of models compiled via that path. For now, the target is transformer-based models and acceleration of the complex operator structure required to implement them.

We are targeting two main compilation paths:
torch.compile and torch._dynamo.export, both of which use the aten converters.

Proposed APIs / UX + Use-cases

The following compilation paths should be supported via the torch_tensorrt UI:
1. Robust converters
The aten path converters should be able to handle a variety of inputs and options, and should not error out in the case of incompatibility. Specifically, if a converter can handle torch.ops.aten.__some_operator__.default, it should be able to handle any reasonable argument inputs and have test cases to accompany.

2. Specifying torch_executed_modules when compiling via the aten path, which excludes certain modules from compilation

torch_tensorrt.dynamo.compile(model, is_aten=True, torch_executed_ops={torch.ops.aten.add.Tensor},...)

3. Custom specification of operators/callables which can encapsulate entire modules
Operators or modules in Torch should be replaceable with custom optimized versions thereof, which could be implemented in TensorRT or otherwise

4. Options for both JIT-compiled models (torch.compile) and export-requiring models (torch._dynamo.export)
Providing users a framework which has options for multiple usecases, including JIT compilation and exporting a compiled model for later use.

5. Support for Dynamic Shapes and Avoiding Recompilation for Dynamic Shapes in Dynamo
Add support for Dynamic shapes in both the compile and export paths. In the compile path, the model should not recompile if a new batch size is encountered (assuming the user has specified the dynamic dimension at compile time). Similarly, the export path should support dynamic dimensions in the same way TorchScript/FX do.

Limitations

This feature does not guarantee accelerated support of all operators, but it should preferably deliver clear messages and cause operators to fallback instead of throwing errors.

Internal Implementation

Design

Since this effort is a combination of different features/modifications, it will not need an extensive design change different from that which is already underway regarding the unification of FX/TS frontends (RFC #1372) and the effort to bootstrap the FX converters library (RFC #1557).

Extensions Required to Core API implementations

Substantial changes and improvements to torch_tensorrt.dynamo.compile and torch_tensorrt.dynamo.fx_ts_compat.compile as well as to torch_tensorrt.fx.*.

Implementation Phases

Prototype - S (complete)

File issues to ensure model coverage of key transformer models on the aten path (🐛 [Bug] Transformers BERT Model does not compile via FX Path #1673, 🐛 [Bug] Transformers T5 Model does not compile via FX Path #1740, 🐛 [Bug] Transformers GPT2 Model does not compile via FX Path #1741)
Improve robustness of converters ( ↔ [Converter] Add support for aten.matmul in the FX aten path #1709, ↔ [Converter] Add support for aten.gelu and aten.tanh in the FX aten path #1713, ↔ [Converter] Add support for Tensor slicing and selection operators in the FX aten path #1714, ↔ [Converter] Add support for Tensor reshape and permute operators in the FX aten path #1724)

MVP - M (complete)

Fully-functional compilation + inference of key transformers in aten FX path
Large + robust set of converters in torch.ops.aten path, potentially 1:1 with available aten::op converters in TorchScript/C++
Prototype of Support for `torch_executed_ops` in FX #1681

Stable Dynamo - L (in progress ~35%)

Dynamic shape support in compile and export (Dynamic shape support in dynamo #2014)
Large majority coverage for transformers aten ops ( fx2trt converters - change of prototype and addition of activation operation #1745, ↔ [Converter] Add support for assorted operators in the FX aten path #1769)
Robust, fail-fast, clear + clean Python API for compile and export (📖 [Story] Improve the torch_tensorrt.dynamo.compile path #1941)
- Clear explanations of failures, where to report issues
Support for relevant TorchScript compilation options for Dynamo (📖 [Story] Improve the torch_tensorrt.dynamo.compile path #1941, 📖 [Story] Sharing Components between Torch export and compile Paths in Torch-TRT #1940)
Coalescing and unifying utilities between export and compile to reduce code and test duplication, improve error diagnosis (📖 [Story] Sharing Components between Torch export and compile Paths in Torch-TRT #1940)

Extension Phase 1 - S

Full compilation of key transformers, including all aten converters implemented

Extension Phase 2 - S/M

Option for optimization of entire modules (such as nn.MultiheadAttention) via a custom tracer/block replacement to TRT

gs-olive · 2023-03-21T19:22:39Z

gs-olive
Mar 21, 2023
Collaborator Author

Notes on FX + Torch Dynamo + Control Flow

`torch.fx.symbolic_trace`

Summary: Will not accept models with data-dependent control flow, since they cannot be symbolically traced. Throws error to user when attempting a trace.

`torch._dynamo.export`

Summary: Does not support dynamic control flow, and occasionally circumvents it like torch.jit.trace. Will accept some models with data-dependent control flow, but not all.

Does Not Compile with Data-Dependent Control Flow

class HasControlFlow(torch.nn.Module):
    def __init__(self):
      super(HasControlFlow, self).__init__()

    def forward(self, x):
      if torch.sum(x) < 1e-4:
        return x + 1
      else:
        return x + 10

inp_if = torch.zeros((4, 4, 4)).cuda().half()
inp_else = torch.rand((4, 4, 4)).cuda().half()

model = HasControlFlow()
td_exp, _ = torch._dynamo.export(model, inp_if, aten_graph=True, tracing_mode="symbolic")

##### Returns:
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/exc.py", line 71, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: generic_jump TensorVariable()

from user code:
   File "export_ctrl_flow.py", line 15, in forward
    if torch.sum(x) < 1:

Does Compile with Data-Dependent Control Flow

class HasControlFlow(torch.nn.Module):
    def __init__(self):
      super(HasControlFlow, self).__init__()

    def forward(self, x):
      if x.shape[0] == 1:
        return x + 1
      else:
        return x + 10

inp_if = torch.zeros((1, 4, 4)).cuda().half()
inp_else = torch.rand((4, 4, 4)).cuda().half()

model = HasControlFlow()
td_exp = torch._dynamo.export(model, inp_if, aten_graph=True, tracing_mode="symbolic")

##### Returns:
graph():
    %arg0 : [#users=1] = placeholder[target=arg0]
    %add_tensor : [#users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg0, 1), kwargs = {})
    return [add_tensor]

Note: The returned graph is the result of propagating a Fake Tensor having dimension inp_if through the graph, thus the symbolic flow is essentially circumvented. This is similar to the behavior of torch.jit.trace (ignoring the control flow and picking a branch).

`torch._dynamo.optimize(...)` and `torch.compile`

Summary: Does support dynamic control flow. Recompiles the model when a new block is encountered based on control flow decisions (recompiling at inference time).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to Dynamo/TRT Workflow #1743

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Improvements to Dynamo/TRT Workflow #1743

gs-olive Mar 17, 2023 Collaborator

Improvements to Dynamo/TRT Workflow

TL;DR

Goal(s)

Proposed APIs / UX + Use-cases

Limitations

Internal Implementation

Design

Extensions Required to Core API implementations

Implementation Phases

Prototype - S (complete)

MVP - M (complete)

Stable Dynamo - L (in progress ~35%)

Extension Phase 1 - S

Extension Phase 2 - S/M

Replies: 1 comment

gs-olive Mar 21, 2023 Collaborator Author

Notes on FX + Torch Dynamo + Control Flow

torch.fx.symbolic_trace

torch._dynamo.export

torch._dynamo.optimize(...) and torch.compile

gs-olive
Mar 17, 2023
Collaborator

gs-olive
Mar 21, 2023
Collaborator Author

`torch.fx.symbolic_trace`

`torch._dynamo.export`

`torch._dynamo.optimize(...)` and `torch.compile`