Skip to content

Latest commit

 

History

History
87 lines (62 loc) · 3.94 KB

README.md

File metadata and controls

87 lines (62 loc) · 3.94 KB

Output structured data with Outlines and BentoML

Outlines is an open-source Python package for structured text generation, integrating with various models to produce controlled, format-specific outputs​. It offers capabilities like fast regex-structured generation, JSON generation following a JSON schema or a Pydantic model, and grammar-structured generation.

This is a BentoML example project, demonstrating how to output structured data from an LLM using Outlines and BentoML. See here for a full list of BentoML example projects.

Prerequisites

If you want to test the Service locally, we recommend you use an Nvidia GPU with at least 16G VRAM.

Install dependencies

git clone https://github.com/bentoml/BentoVLLM.git
cd BentoVLLM/outlines-integration

# Recommend Python 3.11
pip install -r requirements.txt && pip install -f -U "pydantic>=2.0"

Run the BentoML Service

We have defined a BentoML Service in service.py. Run bentoml serve in your project directory to start the Service.

$ bentoml serve .

2024-03-27T10:14:50+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:VLLM" listening on http://localhost:3000 (Press CTRL+C to quit)
INFO 03-27 10:14:54 llm_engine.py:87] Initializing an LLM engine with config: model='mistralai/Mistral-7B-Instruct-v0.2', tokenizer='mistralai/Mistral-7B-Instruct-v0.2', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)

The server is now active at http://localhost:3000. You can interact with it using the Swagger UI or in other different ways.

CURL

curl -X 'POST' \
  'http://localhost:3000/generate' \
  -H 'accept: text/event-stream' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "Create a user profile with the fields name, last_name and id. name should be common English first names. last_name should be common English last names. id should be a random integer",
  "max_tokens": 1024,
  "json_schema": "\n{\n  \"title\": \"User\",\n  \"type\": \"object\",\n  \"properties\": {\n    \"name\": {\"type\": \"string\"},\n    \"last_name\": {\"type\": \"string\"},\n    \"id\": {\"type\": \"integer\"}\n  }\n}\n",
  "regex_string": null
}'

Python client

import bentoml

with bentoml.SyncHTTPClient("http://localhost:3000") as client:
    result = client.generate(
        json_schema="\n{\n  \"title\": \"User\",\n  \"type\": \"object\",\n  \"properties\": {\n    \"name\": {\"type\": \"string\"},\n    \"last_name\": {\"type\": \"string\"},\n    \"id\": {\"type\": \"integer\"}\n  }\n}\n",
        max_tokens=1024,
        prompt="Create a user profile with the fields name, last_name and id. name should be common English first names. last_name should be common English last names. id should be a random integer",
        regex_string="",
    )

Example output:

{
 "name": "Oliver",
 "last_name": "Johnson",
 "id": 123456
}

Deploy to BentoCloud

After the Service is ready, you can deploy the application to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.

Make sure you have logged in to BentoCloud, then run the following command to deploy it.

bentoml deploy .

Once the application is up and running on BentoCloud, you can access it via the exposed URL.

Note: For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.