You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nvidia MLPerf suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:
Does TensorRT modify the backend (CUDA/C++ source code) of Embedding bag operator or it uses the exact same vanilla PyTorch CUDA kernels?
What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?
Please let me know your comments. Thanks
The text was updated successfully, but these errors were encountered:
When TensorRT encounters an unsupported operator, it doesn't automatically find an implementation of it from another source like PyTorch. Instead, one would need to resort to workarounds like manually reimplementing unsupported operations in terms of operations that TensorRT supports.
It may be easier to use TensorRT for just the two MLP components of DLRM, as shown here, than for the entire model.
Hello,
Nvidia MLPerf suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:
Does TensorRT modify the backend (CUDA/C++ source code) of Embedding bag operator or it uses the exact same vanilla PyTorch CUDA kernels?
What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?
Please let me know your comments. Thanks
The text was updated successfully, but these errors were encountered: