-
When I perform inference, even when starting a new session, the conversation history of the previous session carries through, muddying the new prompt. How can I disable conversation history of any kind so I can control exactly what goes to the LLM? I don't want to chat; I want to perform text analysis and transformations. History actively hampers success in this. For example, this response includes a reference to previous turns of conversation (the Google link and 'American shorthair kittens' string), even when I'm creating a new context and session with a
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Using version 3 beta, you can create a new chat session for each analysis, while reusing the existing context sequence. import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: path.join(__dirname, "models", "dolphin-2.1-mistral-7b.Q4_K_M.gguf")
});
const context = await model.createContext({
contextSize: Math.min(4096, model.trainContextSize)
});
const contextSequence = context.getSequence();
const session = new LlamaChatSession({
contextSequence,
autoDisposeSequence: false
});
const q1 = "Hi there, how are you?";
console.log("User: " + q1);
const a1 = await session.prompt(q1);
console.log("AI: " + a1);
session.dispose();
const session2 = new LlamaChatSession({
contextSequence
});
const q1a = "Hi there";
console.log("User: " + q1a);
const a1a = await session2.prompt(q1a);
console.log("AI: " + a1a); |
Beta Was this translation helpful? Give feedback.
Using version 3 beta, you can create a new chat session for each analysis, while reusing the existing context sequence.
This way you don't share a state between prompts, but still utilize the existing allocated resources: