Skip to content

Commit

Permalink
Updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
tjake committed Sep 16, 2024
1 parent 7c3f272 commit 684d347
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 70 deletions.
106 changes: 45 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
[![Discord](https://img.shields.io/discord/1279855254812229642?style=flat-square&label=Discord&color=663399)](https://discord.gg/HsYXHrMu6J)



## 🚀 Features

Model Support:
Expand All @@ -31,42 +30,67 @@ Implements:
* Fast GEMM operations
* Distributed Inference!

Jlama is requires Java 20 or later and utilizes the new [Vector API](https://openjdk.org/jeps/448)
Jlama requires Java 20 or later and utilizes the new [Vector API](https://openjdk.org/jeps/448)
for faster inference.

## ⭐ Give us a star!

Like what you see? Please consider giving this a star (★)!

## 🤔 What is it used for?

Add LLM Inference directly to your Java application.

## 🔬 Demo
## 🔬 Quick Start

Jlama includes a simple UI if you just want to chat with an llm.
### 🕵️‍♀️ How to use as a local client
Jlama includes a command line tool that makes it easy to use.
The CLI can be run with [jbang](https://www.jbang.dev/download/).

```shell
#Install jbang (if you don't have it)
curl -Ls https://sh.jbang.dev | bash -s - app setup

#Install Jlama CLI (will ask if you trust the source)
jbang app install -j 21 --name=jlama --force https://raw.githubusercontent.com/tjake/Jlama/main/jlama.java

#Run the CLI
Usage: jlama [COMMAND]
Jlama is a modern LLM inference engine for Java!

Quantized models are maintained at https://hf.co/tjake

Commands:
download Downloads a HuggingFace model - use owner/name format
quantize Quantize the specified model
chat Interact with the specified model
complete Completes a prompt using the specified model
restapi Starts a openai compatible rest api for interacting with this model
cluster-coordinator Starts a distributed rest api for a model using cluster workers
cluster-worker Connects to a cluster coordinator to perform distributed inference
```
./run-cli.sh download tjake/llama2-7b-chat-hf-jlama-Q4
./run-cli.sh restapi models/llama2-7b-chat-hf-jlama-Q4

Now that you have jlama installed you can download a model from huggingface and chat with it.
Note I have pre-quantized models available at https://hf.co/tjake

```shell
# Download a small model (defaults to ./models)
jlama download tjake/TinyLlama-1.1B-Chat-v1.0-Jlama-Q4

# Run the openai chat api and UI on this model
jlama restapi models/TinyLlama-1.1B-Chat-v1.0-Jlama-Q4

#Open browser to http://localhost:8080/
open http://localhost:8080
```
open browser to http://localhost:8080/

<p align="center">
<img src="docs/demo.png" alt="Demo chat">
</p>

## 👨‍💻 How to use in your Java project

The simplest way to use Jlama is with the [Langchain4j Integration](https://github.com/langchain4j/langchain4j-examples/tree/main/jlama-examples).
### 👨‍💻 How to use in your Java project
The main purpose of Jlama is to provide a simple way to use large language models in Java.

Jlama also includes an [OpenAI chat completion api](https://platform.openai.com/docs/guides/chat-completions/overview) that can be used with many tools in the AI ecosystem.
The simplest way to embed Jlama in your app is with the [Langchain4j Integration](https://github.com/langchain4j/langchain4j-examples/tree/main/jlama-examples).

```shell
./run-cli.sh restapi tjake/llama2-7b-chat-hf-jlama-Q4
```

If you would like to embed Jlama directly, add the following [maven](https://central.sonatype.com/artifact/com.github.tjake/jlama-core/) dependencies to your project:
If you would like to embed Jlama without langchain4j, add the following [maven](https://central.sonatype.com/artifact/com.github.tjake/jlama-core/) dependencies to your project:

```xml

Expand Down Expand Up @@ -124,49 +148,9 @@ Then you can use the Model classes to run models:
}
```

## 🕵️‍♀️ How to use as a local client
Jlama includes a cli tool to run models via the `run-cli.sh` command.
Before you do that first download one or more models from huggingface.

Use the `./run-cli.sh download` command to download models from huggingface.
## ⭐ Give us a Star!

```shell
./run-cli.sh download gpt2-medium
./run-cli.sh download -t XXXXXXXX meta-llama/Llama-2-7b-chat-hf
./run-cli.sh download intfloat/e5-small-v2
```

Then run the cli tool to chat with the model or complete a prompt.
Quanitzation is supported with the `-q` flag. Or you can use pre-quantized models
located in my [huggingface repo](https://huggingface.co/tjake).

```shell
./run-cli.sh complete -p "The best part of waking up is " -t 0.7 -tc 16 -q Q4 -wq I8 models/Llama-2-7b-chat-hf
./run-cli.sh chat -s "You are a professional comedian" models/llama2-7b-chat-hf-jlama-Q4
```

## 🧪 Examples
### Llama 2 7B

```
You: Tell me a joke about cats. Include emojis.
Jlama: Sure, here's a joke for you:
Why did the cat join a band? 🎸🐱
Because he wanted to be the purr-fect drummer! 😹🐾
I hope you found that purr-fectly amusing! 😸🐱
elapsed: 11s, prompt 38.0ms per token, gen 146.2ms per token
You: Another one
Jlama: Of course! Here's another one:
Why did the cat bring a ball of yarn to the party? 🎉🧶
Because he wanted to have a paw-ty! 😹🎉
I hope that one made you smile! 😊🐱
elapsed: 11s, prompt 26.0ms per token, gen 148.4ms per token
```
If you like or are using this project to build your own, please give us a star. It's a free way to show your support.

## 🗺️ Roadmap

Expand Down
8 changes: 0 additions & 8 deletions jlama-net/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -302,14 +302,6 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.6.3</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
</plugins>
</build>
</project>
4 changes: 3 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,9 @@
<jdkToolchain>
<version>22</version>
</jdkToolchain>
<skippedModules>jlama-net</skippedModules>
<sourceFileExcludes>
<sourceFileExclude>**/generated-sources/*.java</sourceFileExclude>
</sourceFileExcludes>
<additionalJOptions>
<additionalJOption>--add-modules=jdk.incubator.vector</additionalJOption>
</additionalJOptions>
Expand Down

0 comments on commit 684d347

Please sign in to comment.