In this tutorial, you will learn how to quickly launch AI DIAL Chat with a self-hosted model powered by Ollama.
Watch a demo video to see it in action.
Docker engine installed on your machine (Docker Compose Version 2.20.0 +).
Refer to Docker documentation.
Clone the repository with the tutorials and change directory to the following folder:
cd dial-docker-compose/ollama
Ollama supports a wide range of popular open-source models.
Consider first the modality your are interested in - is it a regular text-to-text chat model, a multi-modal vision model or an embedding model?
Follow the feature tags (Embeddings
, Code
, Tools
, Vision
) at Ollama Search to find the appropriate model.
We recommend choosing one of the following models which have been tested.
Model | Tools |
---|---|
llama3.1:8b-instruct-q4_0 | ✅ (only in non-streaming mode) |
mistral:7b-instruct-q4_0 | ❌ |
phi3.5:3.8b-mini-instruct-q4_0 | ❌ |
gemma2:2b-instruct-q4_0 | ❌ |
All the models support streaming.
-
Configure
.env
file in the current directory according to the type of model you've chosen:- Set
OLLAMA_CHAT_MODEL
for the name of a text model. - Set
OLLAMA_VISION_MODEL
for the name of a vision model. - Set
OLLAMA_EMBEDDING_MODEL
for the name of an embedding model.
Note: It's not necessary to configure all the models. If a model isn't set, then it won't be downloaded.
- Set
-
Then run the following command to pull and load into the memory of the Ollama server the specified models:
docker compose up --abort-on-container-exit
Keep in mind that a typical size of a lightweight Ollama model is around a few gigabytes. So it may take a few minutes (or more) to download it on the first run, depending on your internet bandwidth and the size of the model you choose.
The models are fully loaded once
ollama-setup
service printsThe Ollama server is up and running.
-
Finally, open http://localhost:3000/ in your browser to launch the AI DIAL Chat application and select an appropriate AI DIAL deployments to converse with:
Self-hosted chat model
deployment for theOLLAMA_CHAT_MODEL
Self-hosted vision model
deployment for theOLLAMA_VISION_MODEL
Note, that the vision models we tested, do not support streaming of response. Moreover, they are typically more computationally expensive than the chat models. So it may take minutes for a vision model to respond.
The embedding model will become available in AI DIAL under the deployment name
embedding-model
and could be called via the endpoint:localhost:8080/openai/deployments/embedding-model/embeddings
.