-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does this compare to Huggingface's Text Embedding Inference? #108
Comments
Hey @alpayariyak , bench-marking is pretty subjective, e.g. a single sentence - 10 token query is nothing you should CPU:CPU is around 3x faster, when using infinity with optimum engine. Candle/torch is not that great at cpu inference, onnx has an edge here. CUDA:TEI round 2-5% faster on 0.55 requests per second on TEI vs 0.52 on infinity. You will need to choose the right image for this, and know that e.g. 89 compute capability is what you should go for on Nvidia L4. startup:The startup time is slightly faster / same order of magnitute. This is for the GPU image. For roberta large, its similar gap. Docker image of TEI is smaller - torch+cuda is a real heavy weight Additional features that TEI misses:
|
@alpayariyak Invested like 4-5h on this and set up an extra doc: Can I please have your feedback on it? |
The benchmark link seems dead, could you please repost ? |
Fixed! |
Your project is amazing ! 🚀 I ❤️ your LICENSE that is better respect the one of TEI (👎) Have you ever though to add an API endpoint that can serve as well as TextSplitter ? |
@Jimmy-Newtron Can you open another issue for that? |
Are the integrations into Langchain? |
The main goal would be to avoid loading in memory twice the same model
Yes I suppose that a LangChain Integration would be required
To optimize the resources used (GPU, VRAM) it would be nice to have the Infinity server to be able to chunk long input sequences into smaller sentences that are fitting the window size of the chosen Embed model. I have found an implementation of a similar concept in the AI21 Studio Text Segmentation that is already available into the LangChain Integrations Here some source codes that may be of interest to conceive a solution: |
huggingface/text-embeddings-inference#232 Does this means that there will be a convergence of the 2 projects? |
Hi,
Thank you for your amazing work!
We'd like to add an embedding template for users to deploy on RunPod, and we're deciding between Infinity and HF's Text Embedding Inference. How would you say Infinity compares, especially in performance?
The text was updated successfully, but these errors were encountered: