Ollama allows more people to directly deploy open source LLM or SLM through simple scripts, and can also build APIs to help local Copilot application scenarios.
Ollama supports running on Windows, macOS, and Linux. You can install Ollama through this link (https://ollama.com/download). After successful installation, you can directly use Ollama script to call Phi-3 through a terminal window. You can see all the available libaries in Ollama. If you open this repository in a Codespace, it will already have Ollama installed.
ollama run phi3
Note
The model will be downloaded first when you run it for the first time. Of course, you can also directly specify the downloaded Phi-3 model. We take WSL as an example to run the command. After the model is successfully downloaded, you can interact directly on the terminal.
If you want to call the Phi-3 API generated by ollama, you can use this command in the terminal to start the Ollama server.
ollama serve
Note
If running MacOS or Linux, please note that you may encounter the following error "Error: listen tcp 127.0.0.1:11434: bind: address already in use" You may get this error when calling running the command. You can either ignore that error, since it typically indicates the server is already running, or you can stop the and restart Ollama:
macOS
brew services restart ollama
Linux
sudo systemctl stop ollama
Ollama supports two API: generate and chat. You can call the model API provided by Ollama according to your needs, by sending requests to the local service running on port 11434.
Chat
curl http://127.0.0.1:11434/api/chat -d '{
"model": "phi3",
"messages": [
{
"role": "system",
"content": "Your are a python developer."
},
{
"role": "user",
"content": "Help me generate a bubble algorithm"
}
],
"stream": false
}'
This is the result in Postman
curl http://127.0.0.1:11434/api/generate -d '{
"model": "phi3",
"prompt": "<|system|>Your are my AI assistant.<|end|><|user|>tell me how to learn AI<|end|><|assistant|>",
"stream": false
}'
This is the result in Postman
Check the list of available models in Ollama in their library.
Pull your model from the Ollama server using this command
ollama pull phi3
Run the model using this command
ollama run phi3
Note: Visit this link https://github.com/ollama/ollama/blob/main/docs/api.md to learn more
You can use requests
or urllib3
to make requests to the local server endpoints used above. However, a popular way to use Ollama in Python is via the openai SDK, since Ollama provides OpenAI-compatible server endpoints as well.
Here is an example for phi3-mini:
import openai
client = openai.OpenAI(
base_url="http://localhost:11434/v1",
api_key="nokeyneeded",
)
response = client.chat.completions.create(
model="phi3",
temperature=0.7,
n=1,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about a hungry cat"},
],
)
print("Response:")
print(response.choices[0].message.content)
// Example of Summarize a file with Phi-3
script({
model: "ollama:phi3",
title: "Summarize with Phi-3",
system: ["system"],
})
// Example of summarize
const file = def("FILE", env.files)
$`Summarize ${file} in a single paragraph.`
Create a new C# Console application and add the following NuGet package:
dotnet add package Microsoft.SemanticKernel --version 1.13.0
Then replace this code in the Program.cs
file
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
// add chat completion service using the local ollama server endpoint
#pragma warning disable SKEXP0001, SKEXP0003, SKEXP0010, SKEXP0011, SKEXP0050, SKEXP0052
builder.AddOpenAIChatCompletion(
modelId: "phi3.5",
endpoint: new Uri("http://localhost:11434/"),
apiKey: "non required");
// invoke a simple prompt to the chat service
string prompt = "Write a joke about kittens";
var response = await kernel.InvokePromptAsync(prompt);
Console.WriteLine(response.GetValue<string>());
Run the app with the command:
dotnet run