An IPEX-LLM container is a pre-configured environment that includes all necessary dependencies for running LLMs on Intel GPUs.
This guide provides steps to run/develop PyTorch examples in VSCode with Docker on Intel GPUs.
Note
This guide assumes you have already installed VSCode in your environment.
To run/develop on Windows, install VSCode and then follow the steps below.
To run/develop on Linux, you might open VSCode first and SSH to a remote Linux machine, then proceed with the following steps.
Follow the Docker installation Guide to install docker on either Linux or Windows.
For both Linux/Windows, you will need to Install Dev Containers extension.
Open the Extensions view in VSCode (you can use the shortcut Ctrl+Shift+X
), then search for and install the Dev Containers
extension.
For Windows, you will need to install wsl extension to to the WSL environment. Open the Extensions view in VSCode (you can use the shortcut Ctrl+Shift+X
), then search for and install the WSL
extension.
Press F1 to bring up the Command Palette and type in WSL: Connect to WSL Using Distro...
and select it and then select a specific WSL distro Ubuntu
Open the Terminal in VSCode (you can use the shortcut Ctrl+Shift+`
), then pull ipex-llm-xpu Docker Image:
docker pull intelanalytics/ipex-llm-xpu:latest
Start ipex-llm-xpu Docker Container. Choose one of the following commands to start the container:
-
For Linux users:
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest export CONTAINER_NAME=my_container export MODEL_PATH=/llm/models[change to your model path] docker run -itd \ --net=host \ --device=/dev/dri \ --memory="32G" \ --name=$CONTAINER_NAME \ --shm-size="16g" \ -v $MODEL_PATH:/llm/models \ $DOCKER_IMAGE
-
For Windows WSL users:
#/bin/bash export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest export CONTAINER_NAME=my_container export MODEL_PATH=/llm/models[change to your model path] sudo docker run -itd \ --net=host \ --privileged \ --device /dev/dri \ --memory="32G" \ --name=$CONTAINER_NAME \ --shm-size="16g" \ -v $MODEL_PATH:/llm/llm-models \ -v /usr/lib/wsl:/usr/lib/wsl \ $DOCKER_IMAGE
Press F1 to bring up the Command Palette and type in Dev Containers: Attach to Running Container...
and select it and then select my_container
Now you are in a running Docker Container, Open folder /ipex-llm/python/llm/example/GPU/HuggingFace/LLM
.
In this folder, we provide several PyTorch examples that you could apply IPEX-LLM INT4 optimizations on models on Intel GPUs.
For example, if your model is Llama-2-7b-chat-hf and mounted on /llm/models, you can navigate to llama2 directory, excute the following command to run example:
cd <model_dir>
python ./generate.py --repo-id-or-model-path /llm/models/Llama-2-7b-chat-hf --prompt PROMPT --n-predict N_PREDICT
Arguments info:
--repo-id-or-model-path REPO_ID_OR_MODEL_PATH
: argument defining the huggingface repo id for the Llama2 model (e.g.meta-llama/Llama-2-7b-chat-hf
andmeta-llama/Llama-2-13b-chat-hf
) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be'meta-llama/Llama-2-7b-chat-hf'
.--prompt PROMPT
: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be'What is AI?'
.--n-predict N_PREDICT
: argument defining the max number of tokens to predict. It is default to be32
.
Sample Output
Inference time: xxxx s
-------------------- Prompt --------------------
<s>[INST] <<SYS>>
<</SYS>>
What is AI? [/INST]
-------------------- Output --------------------
[INST] <<SYS>>
<</SYS>>
What is AI? [/INST] Artificial intelligence (AI) is the broader field of research and development aimed at creating machines that can perform tasks that typically require human intelligence,
You can develop your own PyTorch example based on these examples.