Embedding_bag operator on GPU #357

rishucoding · 2023-09-13T15:57:05Z

Hello,

Nvidia MLPerf suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:

Does TensorRT modify the backend (CUDA/C++ source code) of Embedding bag operator or it uses the exact same vanilla PyTorch CUDA kernels?
What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?

Please let me know your comments. Thanks

samiwilf · 2023-09-14T13:42:07Z

Hi @rishucoding.

TensorRT uses its own CUDA kernels and mainly uses ONNX to import models. It doesn't use PyTorch.

It appears that TensorRT currently lacks an embedding bag operator. It's not in TensorRT's ops table nor in ONNX's. Also, the lack of embedding bag support in ONNX was an issue raised previously in this repo and also an issue raised in ONNX's repo.

When TensorRT encounters an unsupported operator, it doesn't automatically find an implementation of it from another source like PyTorch. Instead, one would need to resort to workarounds like manually reimplementing unsupported operations in terms of operations that TensorRT supports.

It may be easier to use TensorRT for just the two MLP components of DLRM, as shown here, than for the entire model.

rishucoding closed this as completed Sep 14, 2023

rishucoding reopened this Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding_bag operator on GPU #357

Embedding_bag operator on GPU #357

rishucoding commented Sep 13, 2023

samiwilf commented Sep 14, 2023

Embedding_bag operator on GPU #357

Embedding_bag operator on GPU #357

Comments

rishucoding commented Sep 13, 2023

samiwilf commented Sep 14, 2023