Skip to content

Latest commit

 

History

History
81 lines (49 loc) · 3.57 KB

quick-start-with-self-hosted-model.md

File metadata and controls

81 lines (49 loc) · 3.57 KB

Launch AI DIAL Chat with a Self-Hosted Model

Introduction

In this tutorial, you will learn how to quickly launch AI DIAL Chat with a self-hosted model powered by Ollama.

Watch a demo video to see it in action.

Prerequisites

Docker engine installed on your machine (Docker Compose Version 2.20.0 +).

Refer to Docker documentation.

Step 1: Get AI DIAL

Clone the repository with the tutorials and change directory to the following folder:

cd dial-docker-compose/ollama

Step 2: Choose a model to run

Ollama supports a wide range of popular open-source models.

Consider first the modality your are interested in - is it a regular text-to-text chat model, a multi-modal vision model or an embedding model?

Follow the feature tags (Embeddings, Code, Tools, Vision) at Ollama Search to find the appropriate model.

We recommend choosing one of the following models which have been tested.

Chat models

Model Tools
llama3.1:8b-instruct-q4_0 (only in non-streaming mode)
mistral:7b-instruct-q4_0
phi3.5:3.8b-mini-instruct-q4_0
gemma2:2b-instruct-q4_0

All the models support streaming.

Vision models

Embedding models

Step 3: Launch AI DIAL Chat

  1. Configure .env file in the current directory according to the type of model you've chosen:

    • Set OLLAMA_CHAT_MODEL for the name of a text model.
    • Set OLLAMA_VISION_MODEL for the name of a vision model.
    • Set OLLAMA_EMBEDDING_MODEL for the name of an embedding model.

    Note: It's not necessary to configure all the models. If a model isn't set, then it won't be downloaded.

  2. Then run the following command to pull and load into the memory of the Ollama server the specified models:

    docker compose up --abort-on-container-exit

    Keep in mind that a typical size of a lightweight Ollama model is around a few gigabytes. So it may take a few minutes (or more) to download it on the first run, depending on your internet bandwidth and the size of the model you choose.

    The models are fully loaded once ollama-setup service prints The Ollama server is up and running.

  3. Finally, open http://localhost:3000/ in your browser to launch the AI DIAL Chat application and select an appropriate AI DIAL deployments to converse with:

    • Self-hosted chat model deployment for the OLLAMA_CHAT_MODEL
    • Self-hosted vision model deployment for the OLLAMA_VISION_MODEL

    Note, that the vision models we tested, do not support streaming of response. Moreover, they are typically more computationally expensive than the chat models. So it may take minutes for a vision model to respond.

    The embedding model will become available in AI DIAL under the deployment name embedding-model and could be called via the endpoint: localhost:8080/openai/deployments/embedding-model/embeddings.