Skip to content

How to do stateless inference? #185

Closed Answered by giladgd
hawkeyexl asked this question in Q&A
Discussion options

You must be logged in to vote

Using version 3 beta, you can create a new chat session for each analysis, while reusing the existing context sequence.
This way you don't share a state between prompts, but still utilize the existing allocated resources:

import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: path.join(__dirname, "models", "dolphin-2.1-mistral-7b.Q4_K_M.gguf")
});
const context = await model.createContext({
    contextSize: Math.min(4096, model.trainContextSize)
});
const contextSequence = context

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by giladgd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants