LlamaCPP-ts is a Node.js binding for the LlamaCPP library, which wraps the llama.cpp framework. It provides an easy-to-use interface for running language models in Node.js applications, supporting both synchronous queries and asynchronous streaming responses.
Supported Systems:
- MacOS
- Windows (not tested yet)
- Linux (not tested yet)
You can find some models here
Example is using this one Meta-Llama-3.1-8B-Instruct-Q3_K_S.gguf.
Ensure that you have CMake installed on your system:
- On MacOS:
brew install cmake
- On Windows:
choco install cmake
- On Linux:
sudo apt-get install cmake
Then, install the package:
npm install llama.cpp-ts
# or
yarn add llama.cpp-ts
import { Llama } from 'llama.cpp-ts';
const llama = new Llama();
const initialized = llama.initialize('./path/to/your/model.gguf');
if (initialized) {
const response: string = llama.runQuery("Tell me a story.", 100);
console.log(response);
} else {
console.error("Failed to initialize the model.");
}
import { Llama, TokenStream } from 'llama.cpp-ts';
async function main() {
const llama = new Llama();
const initialized: boolean = llama.initialize('./path/to/your/model.gguf');
if (initialized) {
const tokenStream: TokenStream = llama.runQueryStream("Explain quantum computing", 200);
while (true) {
const token: string | null = await tokenStream.read();
if (token === null) break;
process.stdout.write(token);
}
} else {
console.error("Failed to initialize the model.");
}
}
main().catch(console.error);
The Llama
class provides methods to interact with language models loaded through llama.cpp.
constructor()
: Creates a new Llama instance.initialize(modelPath: string, contextSize?: number): boolean
: Initializes the model with the specified path and context size.runQuery(prompt: string, maxTokens?: number): string
: Runs a query with the given prompt and returns the result as a string.runQueryStream(prompt: string, maxTokens?: number): TokenStream
: Streams the response to the given prompt, returning aTokenStream
object.
The TokenStream
class represents a stream of tokens generated by the language model.
read(): Promise<string | null>
: Reads the next token from the stream. Returnsnull
when the stream is finished.