Merge pull request #981 from andreadimaio/improve_watsonx_model_provider

Enable /chat and /chat_stream in watsonx.ai
quarkiverse · Oct 15, 2024 · 9b983a9 · 9b983a9
2 parents 0b94f8b + 6b3edf4
commit 9b983a9
Show file tree

Hide file tree

Showing 45 changed files with 2,953 additions and 956 deletions.
diff --git a/docs/modules/ROOT/pages/watsonx.adoc b/docs/modules/ROOT/pages/watsonx.adoc
@@ -62,111 +62,75 @@ quarkus.langchain4j.watsonx.api-key=hG-...
 
 NOTE: To determine the API key, go to https://cloud.ibm.com/iam/apikeys and generate it.
 
-==== Writing prompts
+==== Interacting with Models
 
-When creating prompts using watsonx.ai, it's important to follow the guidelines of the model you choose. Depending on the model, some special instructions may be required to ensure the desired output. For best results, always refer to the documentation provided for each model to maximize the effectiveness of your prompts.
+The `watsonx.ai` module provides two different modes for interacting with LLM models: `generation` and `chat`. These modes allow you to tailor the interaction based on the complexity of your use case and how much control you want to have over the prompt structure.
 
-To simplify the process of prompt creation, you can use the `prompt-formatter` property to automatically handle the addition of tags to your prompts. This property allows you to avoid manually adding tags by letting the system handle the formatting based on the model's requirements. This functionality is particularly useful for models such as `ibm/granite-13b-chat-v2`, `meta-llama/llama-3-405b-instruct`, and other supported models, ensuring consistent and accurate prompt structures without additional effort.
+You can select the interaction mode using the property `quarkus.langchain4j.watsonx.chat-model.mode`.
 
-To enable this functionality, configure the `prompt-formatter` property in your `application.properties` file as follows:
+* `generation`: In this mode, you must explicitly structure the prompts using the required model-specific tags. This provides full control over the format of the prompt, but requires in-depth knowledge of the model being used. For best results, always refer to the documentation provided of each model to maximize the effectiveness of your prompts.
+* `chat`: This mode abstracts the complexity of tagging by automatically formatting prompts so you can focus on the content (*default value*).
+
+To choose between one of these two modes, add the `chat-model.mode` property to your `application.properties` file:
 
 [source,properties,subs=attributes+]
 ----
-quarkus.langchain4j.watsonx.chat-model.prompt-formatter=true
+quarkus.langchain4j.watsonx.chat-model.mode=chat  // or 'generate'
 ----
 
-When this property is set to `true`, the system will automatically format prompts with the appropriate tags. This helps to maintain prompt clarity and improves interaction with the LLM by ensuring that prompts follow the required structure. If set to `false`, you'll need to manage the tags manually.
+==== Chat Mode
+
+In `chat` mode, you can interact with models without having to manually manage the tags of a prompt.
+
+You might choose this mode if you are looking for dynamic interactions where the model can build on previous messages and provide more contextually relevant responses. This mode simplifies the interaction by automatically managing the necessary tags, allowing you to focus on the content of your prompts rather than formatting.
 
-For example, if you choose to use `ibm/granite-13b-chat-v2` without using the `prompt-formatter`, you will need to manually add the `<|system|>`, `<|user|>` and `<|assistant|>` instructions:
+Chat mode also supports the use of `tools`, allowing the model to perform specific actions or retrieve external data as part of its responses. This extends the capabilities of the model, allowing it to perform complex tasks dynamically and adapt to your needs. More information about tools is available on the xref:./agent-and-tools.adoc[Agent and Tools] page.
 
 [source,properties,subs=attributes+]
 ----
-quarkus.langchain4j.watsonx.api-key=hG-...
-quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
-quarkus.langchain4j.watsonx.chat-model.model-id=ibm/granite-13b-chat-v2
-quarkus.langchain4j.watsonx.chat-model.prompt-formatter=false
+quarkus.langchain4j.watsonx.base-url=${BASE_URL}
+quarkus.langchain4j.watsonx.api-key=${API_KEY}
+quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
+quarkus.langchain4j.watsonx.chat-model.model-id=mistralai/mistral-large
+quarkus.langchain4j.watsonx.chat-model.mode=chat
 ----
 
 [source,java]
 ----
 @RegisterAiService
-public interface LLMService {
-
-    public record Result(Integer result) {}
-
-    @SystemMessage("""
-        <|system|>
-        You are a calculator and you must perform the mathematical operation
-        {response_schema}
-        """)
-    @UserMessage("""
-        <|user|>
-        {firstNumber} + {secondNumber}
-        <|assistant|>
-        """)
-    public Result calculator(int firstNumber, int secondNumber);
+public interface AiService {
+    @SystemMessage("You are a helpful assistant")
+    public String chat(@MemoryId String id, @UserMessage message);
 }
 ----
 
-Enabling the `prompt-formatter` will result in:
+NOTE: The availability of `chat` and `tools` is currently limited to certain models. Not all models support these features, so be sure to consult the documentation for the specific model you are using to confirm whether these features are available.
+
+==== Generation Mode
+
+In `generation` mode, you have complete control over the structure of your prompts by manually specifying tags for a specific model. This mode could be useful in scenarios where a single-response is desired.
 
 [source,properties,subs=attributes+]
 ----
-quarkus.langchain4j.watsonx.api-key=hG-...
-quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
-quarkus.langchain4j.watsonx.chat-model.model-id=ibm/granite-13b-chat-v2
-quarkus.langchain4j.watsonx.chat-model.prompt-formatter=true
+quarkus.langchain4j.watsonx.base-url=${BASE_URL}
+quarkus.langchain4j.watsonx.api-key=${API_KEY}
+quarkus.langchain4j.watsonx.project-id=${PROJECT_ID}
+quarkus.langchain4j.watsonx.chat-model.model-id=mistralai/mistral-large
+quarkus.langchain4j.watsonx.chat-model.mode=generation
 ----
 
 [source,java]
 ----
-@RegisterAiService
-public interface LLMService {
-
-    public record Result(Integer result) {}
-
-    @SystemMessage("""
-        You are a calculator and you must perform the mathematical operation
-        {response_schema}
-        """)
+@RegisterAiService(chatMemoryProviderSupplier = RegisterAiService.NoChatMemoryProviderSupplier.class)
+public interface AiService {
     @UserMessage("""
-        {firstNumber} + {secondNumber}
-        """)
-    public Result calculator(int firstNumber, int secondNumber);
+        <s>[INST] You are a helpful assistant [/INST]</s>\
+        [INST] What is the capital of {capital}? [/INST]""")
+    public String askCapital(String capital);
 }
 ----
 
-The `prompt-formatter` supports the following models:
-
-* `mistralai/mistral-large`
-* `mistralai/mixtral-8x7b-instruct-v01`
-* `sdaia/allam-1-13b-instruct`
-* `meta-llama/llama-3-405b-instruct`
-* `meta-llama/llama-3-1-70b-instruct`
-* `meta-llama/llama-3-1-8b-instruct`
-* `meta-llama/llama-3-70b-instruct`
-* `meta-llama/llama-3-8b-instruct`
-* `ibm/granite-13b-chat-v2`
-* `ibm/granite-13b-instruct-v2`
-* `ibm/granite-7b-lab`
-* `ibm/granite-20b-code-instruct`
-* `ibm/granite-34b-code-instruct`
-* `ibm/granite-3b-code-instruct`
-* `ibm/granite-8b-code-instruct`
-
-==== Tool Execution with Prompt Formatter
-
-In addition to simplifying prompt creation, the `prompt-formatter` property also enables the execution of tools for specific models. Tools allow for dynamic interactions within the model, enabling the AI to perform specific actions or fetch data as part of its response.
-
-When the `prompt-formatter` is enabled and a supported model is selected, the prompt will be automatically formatted to use the tools. More information about tools is available in the xref:./agent-and-tools.adoc[Agent and Tools] page.
-
-Currently, the following model supports tool execution:
-
-* `mistralai/mistral-large`
-* `meta-llama/llama-3-405b-instruct`
-* `meta-llama/llama-3-1-70b-instruct`
-
-IMPORTANT: The `@SystemMessage` and `@UserMessage` annotations are joined by default with a new line. If you want to change this behavior, use the property `quarkus.langchain4j.watsonx.chat-model.prompt-joiner=<value>`. By adjusting this property, you can define your preferred way of joining messages and ensure that the prompt structure meets your specific needs. This customization option is available only when the `prompt-formatter` property is set to `false`. When the `prompt-formatter` is enabled (set to `true`), the prompt formatting, including the addition of tags and message joining, is automatically handled. In this case, the `prompt-joiner` property will be ignored, and you will not have the ability to customize how messages are joined.
+NOTE: The `@SystemMessage` and `@UserMessage` annotations are joined by default with a new line. If you want to change this behavior, use the property `quarkus.langchain4j.watsonx.chat-model.prompt-joiner=<value>`. By adjusting this property, you can define your preferred way of joining messages and ensure that the prompt structure meets your specific needs.
 
 NOTE: Sometimes it may be useful to use the `quarkus.langchain4j.watsonx.chat-model.stop-sequences` property to prevent the LLM model from returning more results than desired.
 

diff --git a/integration-tests/multiple-providers/src/main/resources/application.properties b/integration-tests/multiple-providers/src/main/resources/application.properties
@@ -33,6 +33,12 @@ quarkus.langchain4j.watsonx.c7.base-url=https://somecluster.somedomain.ai:443/ap
 quarkus.langchain4j.watsonx.c7.api-key=test8
 quarkus.langchain4j.watsonx.c7.project-id=proj
 
+quarkus.langchain4j.c8.chat-model.provider=watsonx
+quarkus.langchain4j.watsonx.c8.base-url=https://somecluster.somedomain.ai:443/api
+quarkus.langchain4j.watsonx.c8.api-key=test9
+quarkus.langchain4j.watsonx.c8.project-id=proj
+quarkus.langchain4j.watsonx.c8.chat-model.mode=generation
+
 quarkus.langchain4j.e1.embedding-model.provider=openai
 quarkus.langchain4j.openai.e1.api-key=test5
 quarkus.langchain4j.e2.embedding-model.provider=ollama

diff --git a/...multiple-providers/src/test/java/org/acme/example/multiple/MultipleChatProvidersTest.java b/...multiple-providers/src/test/java/org/acme/example/multiple/MultipleChatProvidersTest.java
@@ -14,6 +14,7 @@
 import io.quarkiverse.langchain4j.ollama.OllamaChatLanguageModel;
 import io.quarkiverse.langchain4j.openshiftai.OpenshiftAiChatModel;
 import io.quarkiverse.langchain4j.watsonx.WatsonxChatModel;
+import io.quarkiverse.langchain4j.watsonx.WatsonxGenerationModel;
 import io.quarkus.arc.ClientProxy;
 import io.quarkus.test.junit.QuarkusTest;
 
@@ -47,6 +48,10 @@ public class MultipleChatProvidersTest {
     @ModelName("c7")
     ChatLanguageModel seventhNamedModel;
 
+    @Inject
+    @ModelName("c8")
+    ChatLanguageModel eighthNamedModel;
+
     @Test
     void defaultModel() {
         assertThat(ClientProxy.unwrap(defaultModel)).isInstanceOf(OpenAiChatModel.class);
@@ -81,4 +86,9 @@ void sixthNamedModel() {
     void seventhNamedModel() {
         assertThat(ClientProxy.unwrap(seventhNamedModel)).isInstanceOf(WatsonxChatModel.class);
     }
+
+    @Test
+    void eighthNamedModel() {
+        assertThat(ClientProxy.unwrap(eighthNamedModel)).isInstanceOf(WatsonxGenerationModel.class);
+    }
 }