LoRA finetuning tutorial (#671)

huggingface · Sep 18, 2024 · d55d3ad · d55d3ad
1 parent 3b381c0
commit d55d3ad
Show file tree

Hide file tree

Showing 8 changed files with 561 additions and 45 deletions.
diff --git a/.github/workflows/doc-build.yml b/.github/workflows/doc-build.yml
@@ -51,7 +51,14 @@ jobs:
       - name: Make documentation
         shell: bash
         run: |
-          doc-builder build optimum.neuron docs/source/ --repo_name optimum-neuron --build_dir neuron-doc-build/ --version ${{ env.VERSION }} --version_tag_suffix "" --html --clean
+          doc-builder build optimum.neuron docs/source/ \
+            --repo_name optimum-neuron \
+            --build_dir neuron-doc-build/ \
+            --version ${{ env.VERSION }} \
+            --version_tag_suffix "" \
+            --html \
+            --clean \
+            --notebook_dir docs/notebooks/
           cd  neuron-doc-build/
           mv optimum.neuron optimum-neuron
-          doc-builder push optimum-neuron --doc_build_repo_id "hf-doc-build/doc-build" --token "${{ secrets.HF_DOC_BUILD_PUSH }}" --commit_msg "Updated with commit $COMMIT_SHA See: https://github.com/huggingface/optimum-neuron/commit/$COMMIT_SHA" --n_retries 5
+          doc-builder push optimum-neuron --doc_build_repo_id "hf-doc-build/doc-build" --token "${{ secrets.HF_DOC_BUILD_PUSH }}" --commit_msg "Updated with commit $COMMIT_SHA See: https://github.com/huggingface/optimum-neuron/commit/$COMMIT_SHA" --n_retries 5
diff --git a/.github/workflows/doc-pr-build.yml b/.github/workflows/doc-pr-build.yml
@@ -36,7 +36,14 @@ jobs:
       - name: Make documentation
         shell: bash
         run: |
-          doc-builder build optimum.neuron docs/source/ --repo_name optimum-neuron --build_dir neuron-doc-build/ --version pr_${{ env.PR_NUMBER }} --version_tag_suffix "" --html --clean
+          doc-builder build optimum.neuron docs/source/ \
+            --repo_name optimum-neuron \
+            --build_dir neuron-doc-build/ \
+            --version pr_${{ env.PR_NUMBER }} \
+            --version_tag_suffix "" \
+            --html \
+            --clean \
+            --notebook_dir docs/notebooks/
 
       - name: Save commit_sha & pr_number
         run: |

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -14,6 +14,8 @@
       title: Fine-tune BERT for Text Classification on AWS Trainium
     - local: training_tutorials/finetune_llm
       title: Fine-tune Llama 3 8B on AWS Trainium
+    - local: training_tutorials/sft_lora_finetune_llm
+      title: Fine-tune Llama 3 8B on with LoRA and the SFTTrainer
     title: Training Tutorials
   - sections:
     - local: inference_tutorials/notebooks

diff --git a/docs/source/training_tutorials/finetune_llm.mdx b/docs/source/training_tutorials/finetune_llm.mdx
@@ -45,15 +45,15 @@ And many others!
 
 Before starting this tutorial, you will need to setup your environment:
 
-1. Create an AWS Trainium instance. You can follow this [guide](https://huggingface.co/docs/optimum-neuron/guides/setup_aws_instance) to create one.
+1. Create an AWS Trainium instance. **You will need a `trn1.32xlarge`, which contains 16 Neuron Devices.** You can follow this [guide](https://huggingface.co/docs/optimum-neuron/guides/setup_aws_instance) to create one.
 2. Make sure you are logged in on the Hugging Face Hub:
 ```bash
 huggingface-cli login --token YOUR_TOKEN
 ```
 3. Check that you have access to the model. Some open source models are gated, meaning that users need to apply to the model owner to be able to use the model weights. Here we will be training Llama-3 8B, for which there are two possibilities:
   * The official gated repo: [`meta-llama/Meta-Llama-3-8B`](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
   * The non-official un-gated repo: [`NousResearch/Meta-Llama-3-8B`](https://huggingface.co/NousResearch/Meta-Llama-3-8B)
-4. Clone the Optimum Neuron repository, **which contains the [complete script](https://github.com/huggingface/optimum-neuron/docs/source/training_tutorials/finetune_llm.py) described in this tutorial:**
+4. Clone the Optimum Neuron repository, **which contains the [complete script](https://github.com/huggingface/optimum-neuron/blob/main/docs/source/training_tutorials/finetune_llm.py) described in this tutorial:**
 ```bash
 git clone https://github.com/huggingface/optimum-neuron.git
 ```
@@ -68,7 +68,10 @@ Example:
 {
   "instruction": "What is world of warcraft",
   "context": "",
-  "response": "World of warcraft is a massive online multi player role playing game. It was released in 2004 by bizarre entertainment"
+  "response": (
+        "World of warcraft is a massive online multi player role playing game. "
+        "It was released in 2004 by blizarre entertainment"
+    )
 }
 ```
 
@@ -98,7 +101,7 @@ def format_dolly(sample):
     return prompt
 ```
 
-In addition to formatting our samples, we also want to pack multiple samples to one sequence to have a more efficient training. In other words, we are stacking multiple samples to one sequence and split them with an EOS Token. Packing/stacking samples can be done during training or before. Here, we will do it before training to save time. 
+In addition to formatting our samples, we also want to pack multiple samples to one sequence to have a more efficient training. In other words, we are stacking multiple samples to one sequence and split them with an EOS Token. Packing/stacking samples can be done during training or before.
 
 The following function `pack_dataset` takes a `dataset` and a `chunk_length` and returns a packed dataset:
 
@@ -181,16 +184,6 @@ dataset = dataset.map(
 lm_dataset = pack_dataset(dataset, chunk_length=2048) # We use 2048 as the maximum length for packing
 ```
 
-After we processed the datasets we are going save it to disk. You could also save it to S3 or the Hugging Face Hub for later use.
-
-_Note: Packing and preprocessing your dataset can be run outside of the Trainium instance._
-
-```python
-# save train_dataset to disk
-dataset_path = "tokenized_dolly"
-lm_dataset.save_to_disk(dataset_path)
-```
-
 ## 3. Fine-tune Llama on AWS Trainium using the `NeuronTrainer`
 
 Normally you would use the **[Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer)** and **[TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments)** classes to fine-tune PyTorch-based transformer models.
@@ -244,16 +237,18 @@ The key points here are:
 
 ## 4. Launch Training 
 
-We prepared a script called [finetune_llm.py](https://github.com/huggingface/optimum-neuron/docs/source/training_tutorials/finetune_llm.py) summing up everything mentioned in this tutorial.
+We prepared a script called [finetune_llm.py](https://github.com/huggingface/optimum-neuron/blob/main/docs/source/training_tutorials/finetune_llm.py) summing up everything mentioned in this tutorial.
 
 <Tip>
 
-This script is a minimalistic version of our official example training script to run causal language modeling fine-tuning, called [run_clm.py](https://github.com/huggingface/optimum-neuron/blob/main/examples/language-modeling/run_clm.py). For the sake of this tutorial, we tried to get rid of anything that is not necessary, but if you want to do more custom things, maybe the solution is already implemented in `run_clm.py`!
+This script is a minimalistic version of our official example training script to run causal language modeling fine-tuning, called [run_clm.py](https://github.com/huggingface/optimum-neuron/blob/main/examples/language-modeling/run_clm.py). For the sake of this tutorial, we tried to get rid of anything that is not necessary, and added the formatting step necessary for fine-tuning, but if you want to do more custom things, maybe the solution is already implemented in `run_clm.py`!
 
 Also, these scripts are more designed as templates than final scripts. Feel free to take `finetune_llm.py` or `run_clm.py` and adapt them to your own needs!
 
 </Tip>
 
+PyTorch Neuron uses `torch_xla`. It evaluates operations lazily during execution of the training loops, which means it builds a symbolic graph in the background and the graph is executed on the hardware only when the tensor is printed, transfered to CPU, or `xm.mark_step()` is called. During execution, multiple graphs can be build depending on control-flow and it can take time to compile each graph sequentially. To alleviate that, the Neuron SDK provides `neuron_parallel_compile`, a tool which performs a fast trial run that builds all the graphs and compile them in parallel. This step is usually called precompilation.
+
 ### Precompilation
 
 When training models on AWS Trainium we first need to compile our model with our training arguments. 
@@ -266,8 +261,7 @@ The compilation command simply consists in calling your script as an input to th
 
 ```bash
 MALLOC_ARENA_MAX=64 XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node=32 finetune_llm.py \
- --model_id {model_id} \
- --dataset_path {dataset_path} \
+ --model_id meta-llama/Meta-Llama-3-8B \
  --bf16 True \
  --learning_rate 5e-5 \
  --output_dir dolly_llama \
@@ -305,8 +299,7 @@ Launch the training, with the following command.
 
 ```bash
 MALLOC_ARENA_MAX=64 XLA_USE_BF16=1 torchrun --nproc_per_node=32 finetune_llm.py \
- --model_id {model_id} \
- --dataset_path {dataset_path} \
+ --model_id meta-llama/Meta-Llama-3-8B \
  --bf16 True \
  --learning_rate 5e-5 \
  --output_dir dolly_llama \

diff --git a/docs/source/training_tutorials/finetune_llm.py b/docs/source/training_tutorials/finetune_llm.py
@@ -1,9 +1,8 @@
 from dataclasses import dataclass, field
 from functools import partial
 from itertools import chain
-from typing import Optional
 
-from datasets import load_dataset, load_from_disk
+from datasets import load_dataset
 from transformers import (
     AutoModelForCausalLM,
     AutoTokenizer,
@@ -17,10 +16,6 @@
 from optimum.neuron.distributed import lazy_load_for_parallelism
 
 
-# Load dataset from the hub
-dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
-
-
 def format_dolly(sample):
     instruction = f"### Instruction\n{sample['instruction']}"
     context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
@@ -70,9 +65,7 @@ def chunk(sample, chunk_length=chunk_length):
     return lm_dataset
 
 
-def create_and_save_dataset(model_id: str, dataset_path: str):
-    tokenizer = AutoTokenizer.from_pretrained(model_id)
-
+def prepare_dataset(tokenizer, dataset):
     # template dataset to add prompt to each sample
     def template_dataset(sample):
         sample["text"] = f"{format_dolly(sample)}{tokenizer.eos_token}"
@@ -89,15 +82,16 @@ def template_dataset(sample):
     # chunk dataset
     lm_dataset = pack_dataset(dataset, chunk_length=2048)  # We use 2048 as the maximum length for packing
 
-    # save train_dataset to disk
-    lm_dataset.save_to_disk(dataset_path)
+    return lm_dataset
 
 
 def training_function(script_args, training_args):
-    # load dataset
-    dataset = load_from_disk(script_args.dataset_path)
-
     tokenizer = AutoTokenizer.from_pretrained(script_args.model_id)
+
+    # Load dataset from the hub and prepare it for training.
+    dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
+    dataset = prepare_dataset(tokenizer, dataset)
+
     with lazy_load_for_parallelism(tensor_parallel_size=training_args.tensor_parallel_size):
         model = AutoModelForCausalLM.from_pretrained(script_args.model_id)
 
@@ -122,20 +116,12 @@ class ScriptArguments:
         default="meta-llama/Meta-Llama-3-8B",
         metadata={"help": "The model that you want to train from the Hugging Face hub."},
     )
-    dataset_path: Optional[str] = field(
-        metadata={"help": "Path to the preprocessed and tokenized dataset."},
-        default=None,
-    )
 
 
 def main():
     parser = HfArgumentParser([ScriptArguments, TrainingArguments])
     script_args, training_args = parser.parse_args_into_dataclasses()
 
-    if script_args.dataset_path is None:
-        create_and_save_dataset(script_args.model_id, "tokenized_dolly")
-        script_args.dataset_path = "tokenized_dolly"
-
     # set seed
     set_seed(training_args.seed)