Skip to content
/ tgi-demo Public template

A demo of high throughput llm serving with TGM

Notifications You must be signed in to change notification settings

bananaml/tgi-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tgi-demo

A demo of high throughput llm serving with TGM

What this is

Source for building a docker container containing:

TGI server

A single TGI server runs per container. It is ran via text-generation-launcher CLI, in a process spawned by the potassium server.

Potassium Server

A single Potassium server runs per container, and its job is to proxy calls to TGI

It is necessary to integrate with Banana, to track concurrent jobs.

Multiple http workers may be spawned using the experimental_num_workers=10 argument.

To use

From a GPU machine with Docker and NVIDIA Container Toolkit installed, run:

bash build_and_run.sh

This will build the docker container from the Dockerfile, and run it with the necessary ports exposed.

Call it with a post request in the example client.py

python3 client.py

About

A demo of high throughput llm serving with TGM

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published