Ampere® optimized llama.cpp

Ampere® optimized build of llama.cpp with full support for rich collection of GGUF models available at HuggingFace: GGUF models

For best results we recommend using models in our custom quantization formats available here: AmpereComputing HF

This Docker image can be run on bare metal Ampere® CPUs and Ampere® based VMs available in the cloud.

Release notes and binary executables are available on our GitHub

Starting container

Default entrypoint runs the server binary of llama.cpp, mimicking behavior of original llama.cpp server image: docker image

To launch shell instead, do this:

sudo docker run --privileged=true --name llama --entrypoint /bin/bash -it amperecomputingai/llama.cpp:latest

Quick start example will be presented at docker container launch:

Make sure to visit us at Ampere Solutions Portal!

Quantization

Ampere® optimized build of llama.cpp provides support for two new quantization methods, Q4_K_4 and Q8R16, offering model size and perplexity similar to Q4_K and Q8_0, respectively, but performing up to 1.5-2x faster on inference.

First, you'll need to convert the model to the GGUF format using this script:

python3 convert-hf-to-gguf.py [path to the original model] --outtype [f32, f16, bf16 or q8_0] --outfile [output path]

For example:

python3 convert-hf-to-gguf.py path/to/llama2 --outtype f16 --outfile llama-2-7b-f16.gguf

Next, you can quantize the model using the following command:

./llama-quantize [input file] [output file] [quantization method]

For example:

./llama-quantize llama-2-7b-f16.gguf llama-2-7b-Q8R16.gguf Q8R16

Benchmark Results

Benchmark results conducted by our Team can be found in benchmarks/example_results, with data selectable by machine type and software.

Support

Please contact us at ai-support@amperecomputing.com

LEGAL NOTICE

By accessing, downloading or using this software and any required dependent software (the “Ampere AI Software”), you agree to the terms and conditions of the software license agreements for the Ampere AI Software, which may also include notices, disclaimers, or license terms for third party software included with the Ampere AI Software. Please refer to the Ampere AI Software EULA v1.6 or other similarly-named text file for additional details.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
benchmarks		benchmarks
.gitignore		.gitignore
README.md		README.md
changelog.md		changelog.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ampere® optimized llama.cpp

Starting container

Quantization

Benchmark Results

Support

LEGAL NOTICE

About

Releases 4

Packages

Contributors 5

Languages

AmpereComputingAI/llama.cpp

Folders and files

Latest commit

History

Repository files navigation

Ampere® optimized llama.cpp

Starting container

Quantization

Benchmark Results

Support

LEGAL NOTICE

About

Topics

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 5

Languages

Packages