Llama 3 Playground

A fully-contained, ready-to-run environment to finetune Llama 3 model with custom dataset and run inference on the fine-tuned models

Requirements

Docker
Nvidia GPU

Note: This is tested only on NVIDIA RTX 2080 and NVIDIA Tesla T4 GPUs so far. It hasn't been tested with the other GPU classes or on CPUs.

Run this command on your host machine to check which Nvidia GPU you've installed.

nvidia-smi

That should display your GPU info.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080        Off | 00000000:01:00.0  On |                  N/A |
| 22%   38C    P8              17W / 215W |    197MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

What does the setup/image contain?

Python 3.10
JupyterLab
Huggingface CLI
EasyOCR (with the English detection model pre-downloaded). This is for running character recognition on PDF/image files.
Llama3 model pre-downloaded
Scripts to run OCR, training and inference.
Sample dataset to finetune the model with

Setup

git clone https://github.com/amithkoujalgi/llama3-playground.git
cd llama3-playground

bash build.sh

Run

bash run.sh

This starts the Docker container with the following services.

Service	Externally accessible endpoint	Internal Port	Description
Supervisor	http://localhost:8884	9001	For running training on custom dataset and viewing logs of trainer process
FastAPI Server	http://localhost:8883/docs	8070	For accessing APIs of the model server
JupyterLab Server	http://localhost:8888/lab	8888	Access JupyterLab interface for browsing the container and updating/experimenting with the code

Note: All the processes (OCR, training and inference) use GPU and if more than one process of any type would be run simultaneously, we would encounter out-of-memory (OOM) issues. To handle that, the system has been designed to run only one process at any given point in time. (i.e., Only one instance of OCR or training or inference can be run at a time)

Feel free to update the code according to your needs.

Running commands from Jupyter

Train model

Go to terminal and type

playground --train

List models

Go to terminal and type

playground -l

This produces models under /app/data/trained-models/. The trainer script produces 2 models:

a model that has just the LoRA adapters and is suffixed with lora-adapters.
a full model that has just the LoRA adapters merged with the base model.

Run OCR:

cd /app/llama3_playground/core

python ocr.py \
  -f "/app/sample.pdf"

For understanding what the options mean, go to JupyterLab and execute python ocr.py -h

Inference with RAG:

cd /app/llama3_playground/core

python infer_rag.py \
  -m "llama-3-8b-instruct-custom-1720802202" \
  -d "/app/data/ocr-runs/123/text-result.txt" \
  -q "What is the employer name, address, telephone, TIN, tax year end, type of business, plan name, Plan Sequence Number, Trust ID, Account number, is it a new plan or existing plan as true or false, are elective deferrals and roth deferrals allowed as true or false, are loans permitted as true or false, are life insurance investments permitted and what is the ligibility Service Requirement selected?" \
  -t 256 \
  -e "Alibaba-NLP/gte-base-en-v1.5" \
  -p "There are checkboxes in the text that denote the value as selected if the text is [Yes], and unselected if the text is [No]. The checkbox option's value can either be before the selected value or after. Keep this in context while responding and be very careful and precise in picking these values. Always respond as JSON. Keep the responses precise and concise."

For understanding what the options mean, go to JupyterLab and execute python infer_rag.py -h

Additional setup instructions

This would be needed if you do not have NVIDIA Container Toolkit installed on your host machine.

Install NVIDIA Container Toolkit if you're running a Ubuntu host

# Configure the production repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Optionally, configure the repository to use experimental packages
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Update the packages list from the repository
sudo apt-get update

# Install the NVIDIA Container Toolkit packages
sudo apt-get install -y nvidia-container-toolkit

For other environments, refer to this.

APIs

Inference

Generate response from the model

curl --silent -X 'POST' \
  'http://localhost:8883/api/infer/sync/ctx-text' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model_name": "llama-3-8b-instruct-custom-1720690384",
  "context_data": "You are a magician who goes by the name Magica",
  "question_text": "Who are you?",
  "prompt_text": "Respond in a musical and Shakespearean tone",
  "max_new_tokens": 50
}' | jq -r ".data.response"

OCR

Run OCR on PDF file by uploading the file

curl -X 'POST' \
  'http://localhost:8883/api/ocr/sync/pdf' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@your_file.pdf;type=application/pdf'

Get status of OCR process. Returns `true` if any OCR process is running, `false` otherwise.

curl -X 'GET' \
  'http://localhost:8883/api/ocr/status' \
  -H 'accept: application/json'

References:

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
apps		apps
inspiration		inspiration
llama3_playground		llama3_playground
models		models
notebooks		notebooks
samples		samples
temp_workflow		temp_workflow
tests		tests
training-dataset		training-dataset
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
build_wheel.py		build_wheel.py
config.json		config.json
requirements.txt		requirements.txt
setup.py		setup.py
supervisord.conf		supervisord.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama 3 Playground

Requirements

What does the setup/image contain?

Setup

Run

Running commands from Jupyter

Train model

List models

Additional setup instructions

Install NVIDIA Container Toolkit if you're running a Ubuntu host

APIs

Inference

Generate response from the model

OCR

Run OCR on PDF file by uploading the file

Get status of OCR process. Returns `true` if any OCR process is running, `false` otherwise.

About

Releases

Packages

Contributors 2

Languages

amithkoujalgi/llama3-playground

Folders and files

Latest commit

History

Repository files navigation

Llama 3 Playground

Requirements

What does the setup/image contain?

Setup

Run

Running commands from Jupyter

Train model

List models

Additional setup instructions

Install NVIDIA Container Toolkit if you're running a Ubuntu host

APIs

Inference

Generate response from the model

OCR

Run OCR on PDF file by uploading the file

Get status of OCR process. Returns true if any OCR process is running, false otherwise.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Get status of OCR process. Returns `true` if any OCR process is running, `false` otherwise.

Packages