llamalib - compiled python llama.cpp wrappers

The project includes three different experimental python wrappers of @ggerganov's llama.cpp which is likely the most active open-source compiled LLM inference engine. The python wrapping frameworks used are cython, pybind11, and nanobind and share the common feature that they are compiled, and, in this project, statically linked against llama.cpp.

The goals of this projects are to:

Produce a minimal performant / compiled python wrapper around the core llama-cli feature-set of llama.cpp.
Integrate wrappers of other related projects such as whisper.cpp and stable-diffusion.cpp
Learn about the internals of this popular C++/C LLM inference engine.

Given that there is a fairly mature, well-maintained and performant ctypes based wrapper provided by @abetlen's llama-cpp-python project and that llm inference is gpu-driven rather than cpu-driven, this all may see quite redundant. Irrespective, there are some benefits to developing alternative python wrappers to llama.cpp:

Packaging benefits with respect to self-contained statically compiled extension modules.
There may be some incidental performance the use of compiled wrappers over the use of ctypes.
It may be possible to incorporate external optimizations more readily into compiled wrappers, and
It provides a basis for integration with other code written in a given wrapper variant.
It may be useful in case one wants to de-couple the python frontend and wrapper backends to existing frameworks: that is a future development idea may be to just replace the ctypes wrapper in llama-cpp-python with one of compiled python wrappers and contribute it back as a PR.
This is the most efficient way, for me at least, to learn about the underlying technologies.

Status

Development only on macOS to keep things simple. The following table provide an overview of the current wrapping/dev status:

status	pbllama	nbllama	cyllama
wrapper-type	pybind11	nanobind	cython
wrap llama.h	1	1	1
wrap high-level simple-cli	1	1	1
wrap low-level simple-cli	1	1	1
wrap low-level llama-cli	0	0	0

The initial milestone for each wrapper type was to create a high-level wrapper of the simple.cpp llama.cpp example, following by a low-level one. The high-level wrapper c++ code is placed in llamalib.h single-header library, and wrapping is complete for all three frameworks. The final object is to fully wrap the functionality of llama-cli for all three wrapper-types.

It goes without saying that any help / collaboration / contributions to accelerate the above would be welcome!

Setup

To build llamalib:

A recent version of python3 (testing on python 3.12)
cmake, which can be installed on MacOS using homebrew with brew install cmake
The following python wrapping libraries, if you don't already have them. All python dependencies can be installed via pip install -r requirements.txt (feel free to use virtualenv if you like):
- cython
- pybind11
- nanobind

With the above dependencies installed, download and build the llamalib system, just type the following:

git clone https://github.com/shakfu/llamalib.git
cd llamalib
make

This will:

Download and build llama.cpp
Install it into bin, include, and lib in the cloned llamalib folder
Build cyllama (cython wrapper)
Build pbllama (pybind11 wrapper)
Build nbllama (nanobind wrapper)

Testing

As a first step, you should download a smallish llm in the .gguf model from huggingface. This document provides some examples of models which have been known to work on a 16GB M1 Macbook air.

A good model to start with is Llama-3.2-1B-Instruct-Q6_K.gguf. After downloading it, place the model in the llamalib/models folder and run:

bin/llama-simple -c 512 -n 512 -m models/Llama-3.2-1B-Instruct-Q6_K.gguf \
	-p "Is mathematics discovered or invented?"

Now, you will need pytest installed to run tests:

pytest

If all tests pass, feel free to cd into the tests directory and run some examples directly, for example:

cd tests && python3 cy_simple.py`

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
docs		docs
projects		projects
scripts		scripts
tests		tests
thirdparty		thirdparty
.clang-format		.clang-format
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llamalib - compiled python llama.cpp wrappers

Status

Setup

Testing

TODO

About

Releases

Languages

License

shakfu/llamalib

Folders and files

Latest commit

History

Repository files navigation

llamalib - compiled python llama.cpp wrappers

Status

Setup

Testing

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages