Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Llava GPU Example #12311

Merged
merged 6 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 23 additions & 45 deletions python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# LLaVA
In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API on LLaVA models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) as a reference LLaVA model.
In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate LLaVA models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) as a reference LLaVA model.

## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.

## Example: Multi-turn chat centered around an image using `generate()` API
In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
## Example: Predict Tokens using `generate()` API
In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
### 1. Install
#### 1.1 Installation on Linux
We suggest using conda to manage environment:
Expand All @@ -15,12 +15,7 @@ conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

pip install einops # install dependencies required by llava

git clone https://github.com/haotian-liu/LLaVA.git # clone the llava libary
cp generate.py ./LLaVA/ # copy our example to the LLaVA folder
cd LLaVA # change the working directory to the LLaVA folder
git checkout tags/v1.2.0 -b 1.2.0 # Get the branch which is compatible with transformers 4.36
pip install transformers==4.43.0
```

#### 1.2 Installation on Windows
Expand All @@ -32,12 +27,7 @@ conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

pip install einops # install dependencies required by llava

git clone https://github.com/haotian-liu/LLaVA.git # clone the llava libary
copy generate.py .\LLaVA\ # copy our example to the LLaVA folder
cd LLaVA # change the working directory to the LLaVA folder
git checkout tags/v1.2.0 -b 1.2.0 # Get the branch which is compatible with transformers 4.36
pip install transformers==4.43.0
```

### 2. Configures OneAPI environment variables for Linux
Expand Down Expand Up @@ -116,42 +106,30 @@ set SYCL_CACHE_PERSISTENT=1
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
### 4. Running examples

```bash
python ./generate.py --image-path-or-url 'https://llava-vl.github.io/static/images/monalisa.jpg'
```
python ./generate.py
```

In the example, several arguments can be passed to satisfy your requirements:

- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the LLaVA model (e.g. `liuhaotian/llava-v1.5-7b` to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'liuhaotian/llava-v1.5-7b'`.
- `--image-path-or-url IMAGE_PATH_OR_URL`: argument defining the input image that the chat will focus on. It is required.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `512`.

If you encounter some network error (which means your machine is unable to access huggingface.co) when running this example, refer to [Trouble Shooting](#4-trouble-shooting) section.

Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the LLaVA model (e.g. `llava-hf/llava-1.5-7b-hf`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'llava-hf/llava-1.5-7b-hf'`.
- `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg'`.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'Describe image in detail'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.

#### Sample Output
#### [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b)
#### [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf)

```log
USER: Do you know who drew this painting?
ASSISTANT: Yes, the painting is a portrait of a woman by Leonardo da Vinci. It's a famous artwork known as the "Mona Lisa."
USER: Can you describe this painting?
ASSISTANT: The painting features a well-detailed portrait of a woman, painted in oil on a canvas. The woman appears to be a young woman staring straight ahead in a direct gaze towards the viewer. The woman's facial features are rendered sharply in the brush strokes, giving her a lifelike, yet enigmatic expression.
The background of the image mainly showcases the woman's face, with some hills visible in the lower part of the painting. The artist employs a wide range of shades, evoking a sense of depth and realism in the subject matter. The technique used in this portrait sets it apart from other artworks during the Renaissance period, making it a notable piece in art history.
Inference time: xxxx s
-------------------- Input Image --------------------
http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
-------------------- Prompt --------------------
Describe image in detail
-------------------- Output --------------------
<s> USER: <image>
Describe image in detail ASSISTANT: The image features a young girl holding a white teddy bear in her hands. She is smiling and appears to be enjoying the moment. The girl is
```

The sample input image is:

<a href="https://llava-vl.github.io/static/images/monalisa.jpg"><img width=400px src="https://llava-vl.github.io/static/images/monalisa.jpg" ></a>

### 5 Trouble shooting

#### 5.1 SSLError
If you encounter the following output, it means your machine has some trouble accessing huggingface.co.
```log
requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /openai/clip-vit-large-patch14-336/resolve/main/config.json (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1129)')))"),
```
The sample input image is (which is fetched from [COCO dataset](https://cocodataset.org/#explore?id=264959)):

You can resolve this problem with the following steps:
1. Download https://huggingface.co/openai/clip-vit-large-patch14-336 on some machine that can access huggingface.co, and put it in huggingface's local cache (default to be `~/.cache/huggingface/hub`) on the machine that you are going to run this example.
2. Set the environment variable (`export TRANSFORMERS_OFFLINE=1`) before you run the example.
<a href="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg"><img width=400px src="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg" ></a>
Loading