Name		Name	Last commit message	Last commit date
parent directory ..
data/mt_bench		data/mt_bench
evaluation		evaluation
README.md		README.md
requirements.txt		requirements.txt

README.md

EAGLE - Speculative Sampling using IPEX-LLM on Intel GPUs

In this directory, you will find the examples on how IPEX-LLM accelerate inference with speculative sampling using EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a speculative sampling method that improves text generation speed) on Intel GPUs. See here to view the paper and here for more info on EAGLE code.

Requirements

To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the GPU installation guide for more details.

Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.

Step 2, please refer to our driver installation for general purpose GPU capabilities.

Note: IPEX 2.1.10+xpu requires Intel GPU Driver version >= stable_775_20_20231219.

Step 3, you also need to download and install Intel® oneAPI Base Toolkit. OneMKL and DPC++ compiler are needed, others are optional.

Note: IPEX 2.1.10+xpu requires Intel® oneAPI Base Toolkit's version == 2024.0.

Verified Hardware Platforms

Intel Data Center GPU Max Series
Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series

Example - EAGLE-2 Speculative Sampling with IPEX-LLM on MT-bench

In this example, we run inference for a Llama2 model to showcase the speed of EAGLE with IPEX-LLM on MT-bench data on Intel GPUs. We use EAGLE-2 which have better performance than EAGLE-1

1. Install

1.1 Installation on Linux

We suggest using conda to manage environment:

conda create -n llm python=3.11
conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
git clone https://github.com/SafeAILab/EAGLE.git
cd EAGLE
pip install -r requirements.txt
pip install -e .

1.2 Installation on Windows

We suggest using conda to manage environment:

conda create -n llm python=3.11 libuv
conda activate llm
# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0

# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
git clone https://github.com/SafeAILab/EAGLE.git
cd EAGLE
pip install -r requirements.txt
pip install -e .

2. Configures OneAPI environment variables for Linux

Note

Skip this step if you are running on Windows.

This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.

source /opt/intel/oneapi/setvars.sh

3. Runtime Configurations

For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.

3.1 Configurations for Linux

For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series

export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1

For Intel Data Center GPU Max Series

export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
export ENABLE_SDP_FUSION=1

Note: Please note that libtcmalloc.so can be installed by conda install -c conda-forge -y gperftools=2.10.

4. Running Example

You can test the speed of EAGLE speculative sampling with ipex-llm on MT-bench using the following command.

python -m evaluation.gen_ea_answer_llama2chat_e2_ipex_optimize\
                 --ea-model-path [path of EAGLE weight]\
                 --base-model-path [path of the original model]\
                 --enable-ipex-llm\

Please refer to here for the complete list of available EAGLE weights.

The above command will generate a .jsonl file that records the generation results and wall time. Then, you can use evaluation/speed.py to calculate the speed.

python -m evaluation.speed\
                 --base-model-path [path of the original model]\
                 --jsonl-file [pathname of the .jsonl file]\

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EAGLE

EAGLE

README.md

EAGLE - Speculative Sampling using IPEX-LLM on Intel GPUs

Requirements

Verified Hardware Platforms

Example - EAGLE-2 Speculative Sampling with IPEX-LLM on MT-bench

1. Install

1.1 Installation on Linux

1.2 Installation on Windows

2. Configures OneAPI environment variables for Linux

3. Runtime Configurations

3.1 Configurations for Linux

4. Running Example

Files

EAGLE

Directory actions

More options

Directory actions

More options

Latest commit

History

EAGLE

Folders and files

parent directory

README.md

EAGLE - Speculative Sampling using IPEX-LLM on Intel GPUs

Requirements

Verified Hardware Platforms

Example - EAGLE-2 Speculative Sampling with IPEX-LLM on MT-bench

1. Install

1.1 Installation on Linux

1.2 Installation on Windows

2. Configures OneAPI environment variables for Linux

3. Runtime Configurations

3.1 Configurations for Linux

4. Running Example