Add Functionality in LLMBlock to Override Global OpenAI Client Variable #217

npalaska · 2024-07-25T20:53:51Z

Add functionality in LLMBlock within the pipeline to override the global OpenAI client variable. This enhancement will allow us to support running multiple OpenAI clients for different LLMBlock instances if desired. The primary intention is to run LLMBlock inference calls against a model deployment tailored to serve specific inference requests.

Currently, in vLLM, certain LoRA inference calls do not support specific performance optimization flags. By separating these inference calls from the non-LoRA inference calls, we can deploy multiple instances of vLLM, each optimized for different types of inference calls. This would ensure better performance.

The text was updated successfully, but these errors were encountered:

nathan-weinberg added the enhancement New feature or request label Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Functionality in LLMBlock to Override Global OpenAI Client Variable #217

Add Functionality in LLMBlock to Override Global OpenAI Client Variable #217

npalaska commented Jul 25, 2024

Add Functionality in LLMBlock to Override Global OpenAI Client Variable #217

Add Functionality in LLMBlock to Override Global OpenAI Client Variable #217

Comments

npalaska commented Jul 25, 2024