Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding_bag operator on GPU #357

Open
rishucoding opened this issue Sep 13, 2023 · 1 comment
Open

Embedding_bag operator on GPU #357

rishucoding opened this issue Sep 13, 2023 · 1 comment

Comments

@rishucoding
Copy link

Hello,

Nvidia MLPerf suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:

  • Does TensorRT modify the backend (CUDA/C++ source code) of Embedding bag operator or it uses the exact same vanilla PyTorch CUDA kernels?

  • What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?

Please let me know your comments. Thanks

@samiwilf
Copy link
Contributor

Hi @rishucoding.

TensorRT uses its own CUDA kernels and mainly uses ONNX to import models. It doesn't use PyTorch.

It appears that TensorRT currently lacks an embedding bag operator. It's not in TensorRT's ops table nor in ONNX's. Also, the lack of embedding bag support in ONNX was an issue raised previously in this repo and also an issue raised in ONNX's repo.

When TensorRT encounters an unsupported operator, it doesn't automatically find an implementation of it from another source like PyTorch. Instead, one would need to resort to workarounds like manually reimplementing unsupported operations in terms of operations that TensorRT supports.

It may be easier to use TensorRT for just the two MLP components of DLRM, as shown here, than for the entire model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants