Axolotl is a popular tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures. You can now use ipex-llm
as an accelerated backend for Axolotl
running on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max).
See the demo of finetuning LLaMA2-7B on Intel Arc GPU below.
You could also click here to watch the demo video. |
- Prerequisites
- Install IPEX-LLM for Axolotl
- Example: Finetune Llama-2-7B with Axolotl
- Finetune Llama-3-8B (Experimental)
- Troubleshooting
IPEX-LLM's support for Axolotl v0.4.0 is only available for Linux system. We recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).
Visit the Install IPEX-LLM on Linux with Intel GPU, follow Install Intel GPU Driver and Install oneAPI to install GPU driver and Intel® oneAPI Base Toolkit 2024.0.
Create a new conda env, and install ipex-llm[xpu]
.
conda create -n axolotl python=3.11
conda activate axolotl
# install ipex-llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
Install axolotl v0.4.0 from git.
# install axolotl v0.4.0
git clone https://github.com/OpenAccess-AI-Collective/axolotl -b v0.4.0
cd axolotl
# replace requirements.txt
rm requirements.txt
wget -O requirements.txt https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/requirements-xpu.txt
pip install -e .
pip install transformers==4.36.0
# to avoid https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544
pip install datasets==2.15.0
# prepare axolotl entrypoints
wget https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/finetune.py
wget https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/train.py
After the installation, you should have created a conda environment, named axolotl
for instance, for running Axolotl
commands with IPEX-LLM.
The following example will introduce finetuning Llama-2-7B with alpaca_2k_test dataset using LoRA and QLoRA.
Note that you don't need to write any code in this example.
Model | Dataset | Finetune method |
---|---|---|
Llama-2-7B | alpaca_2k_test | LoRA (Low-Rank Adaptation) |
Llama-2-7B | alpaca_2k_test | QLoRA (Quantized Low-Rank Adaptation) |
For more technical details, please refer to Llama 2, LoRA and QLoRA.
By default, Axolotl will automatically download models and datasets from Huggingface. Please ensure you have login to Huggingface.
huggingface-cli login
If you prefer offline models and datasets, please download Llama-2-7B and alpaca_2k_test. Then, set HF_HUB_OFFLINE=1
to avoid connecting to Huggingface.
export HF_HUB_OFFLINE=1
Note
This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
Configure oneAPI variables by running the following command:
source /opt/intel/oneapi/setvars.sh
Configure accelerate to avoid training with CPU. You can download a default default_config.yaml
with use_cpu: false
.
mkdir -p ~/.cache/huggingface/accelerate/
wget -O ~/.cache/huggingface/accelerate/default_config.yaml https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/default_config.yaml
As an alternative, you can config accelerate based on your requirements.
accelerate config
Please answer NO
in option Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:
.
After finishing accelerate config, check if use_cpu
is disabled (i.e., use_cpu: false
) in accelerate config file (~/.cache/huggingface/accelerate/default_config.yaml
).
Prepare lora.yml
for Axolotl LoRA finetune. You can download a template from github.
wget https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/lora.yml
If you are using the offline model and dataset in local env, please modify the model path and dataset path in lora.yml
. Otherwise, keep them unchanged.
# Please change to local path if model is offline, e.g., /path/to/model/Llama-2-7b-hf
base_model: NousResearch/Llama-2-7b-hf
datasets:
# Please change to local path if dataset is offline, e.g., /path/to/dataset/alpaca_2k_test
- path: mhenrichsen/alpaca_2k_test
type: alpaca
Modify LoRA parameters, such as lora_r
and lora_alpha
, etc.
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
Launch LoRA training with the following command.
accelerate launch finetune.py lora.yml
In Axolotl v0.4.0, you can use train.py
instead of -m axolotl.cli.train
or finetune.py
.
accelerate launch train.py lora.yml
Prepare lora.yml
for QLoRA finetune. You can download a template from github.
wget https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/qlora.yml
If you are using the offline model and dataset in local env, please modify the model path and dataset path in qlora.yml
. Otherwise, keep them unchanged.
# Please change to local path if model is offline, e.g., /path/to/model/Llama-2-7b-hf
base_model: NousResearch/Llama-2-7b-hf
datasets:
# Please change to local path if dataset is offline, e.g., /path/to/dataset/alpaca_2k_test
- path: mhenrichsen/alpaca_2k_test
type: alpaca
Modify QLoRA parameters, such as lora_r
and lora_alpha
, etc.
adapter: qlora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
Launch LoRA training with the following command.
accelerate launch finetune.py qlora.yml
In Axolotl v0.4.0, you can use train.py
instead of -m axolotl.cli.train
or finetune.py
.
accelerate launch train.py qlora.yml
Warning: this section will install axolotl main (796a085) for new features, e.g., Llama-3-8B.
Axolotl main has lots of new dependencies. Please setup a new conda env for this version.
conda create -n llm python=3.11
conda activate llm
# install axolotl main
git clone https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl && git checkout 796a085
pip install -e .
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
# install transformers etc
# to avoid https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544
pip install datasets==2.15.0
pip install transformers==4.37.0
Config accelerate and oneAPIs, according to Set Environment Variables.
Based on axolotl Llama-3 QLoRA example.
Prepare llama3-qlora.yml
for QLoRA finetune. You can download a template from github.
wget https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/llama3-qlora.yml
If you are using the offline model and dataset in local env, please modify the model path and dataset path in llama3-qlora.yml
. Otherwise, keep them unchanged.
# Please change to local path if model is offline, e.g., /path/to/model/Meta-Llama-3-8B
base_model: meta-llama/Meta-Llama-3-8B
datasets:
# Please change to local path if dataset is offline, e.g., /path/to/dataset/alpaca_2k_test
- path: aaditya/alpaca_subset_1
type: alpaca
Modify QLoRA parameters, such as lora_r
and lora_alpha
, etc.
adapter: qlora
lora_model_dir:
sequence_len: 256
sample_packing: true
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
accelerate launch finetune.py llama3-qlora.yml
You can also use train.py
instead of -m axolotl.cli.train
or finetune.py
.
accelerate launch train.py llama3-qlora.yml
Expected output
{'loss': 0.237, 'learning_rate': 1.2254711850265387e-06, 'epoch': 3.77}
{'loss': 0.6068, 'learning_rate': 1.1692453482951115e-06, 'epoch': 3.77}
{'loss': 0.2926, 'learning_rate': 1.1143322458989303e-06, 'epoch': 3.78}
{'loss': 0.2475, 'learning_rate': 1.0607326072295087e-06, 'epoch': 3.78}
{'loss': 0.1531, 'learning_rate': 1.008447144232094e-06, 'epoch': 3.79}
{'loss': 0.1799, 'learning_rate': 9.57476551396197e-07, 'epoch': 3.79}
{'loss': 0.2724, 'learning_rate': 9.078215057463868e-07, 'epoch': 3.79}
{'loss': 0.2534, 'learning_rate': 8.594826668332445e-07, 'epoch': 3.8}
{'loss': 0.3388, 'learning_rate': 8.124606767246579e-07, 'epoch': 3.8}
{'loss': 0.3867, 'learning_rate': 7.667561599972505e-07, 'epoch': 3.81}
{'loss': 0.2108, 'learning_rate': 7.223697237281668e-07, 'epoch': 3.81}
{'loss': 0.0792, 'learning_rate': 6.793019574868775e-07, 'epoch': 3.82}
Error message: TypeError: argument of type 'PosixPath' is not iterable
This issue is related to axolotl #1544. It can be fixed by downgrading datasets to 2.15.0.
pip install datasets==2.15.0
Error message: RuntimeError: Allocation is out of device memory on current platform.
This issue is caused by running out of GPU memory. Please reduce lora_r
or micro_batch_size
in qlora.yml
or lora.yml
, or reduce data using in training.
Error message: OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
oneAPI environment is not correctly set. Please refer to Set Environment Variables.