Skip to content

Commit

Permalink
Added README
Browse files Browse the repository at this point in the history
  • Loading branch information
TJ-Solergibert committed May 22, 2024
1 parent a28c532 commit 3e169c5
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 3 deletions.
19 changes: 19 additions & 0 deletions tools/llama3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Llama3 Weight conversion tool
This directory contains the scripts to convert the Llama3 checkpoints from HuggingFace to Nanotron and vice versa.

- Convert from HuggingFace to Nanotron

`torchrun --nproc-per-node 1 tools/llama3/convert_hf_to_nanotron.py --nanotron-checkpoint-path nanotron_checkpoints/Nanotron-Llama-3-8B --pretrained-model-name-or-path meta-llama/Meta-Llama-3-8B-Instruct`
- Convert from Nanotron to HuggingFace

`torchrun --nproc-per-node 1 tools/llama3/convert_nanotron_to_hf.py --nanotron-checkpoint-path nanotron_checkpoints/Nanotron-Llama3-8B --hugging-face-checkpoint-path hf_checkpoints/Converted-Nanotron-Llama-3-8B`

In summary, we will do the following:
- Initialize the HuggingFace model with the pretrained weights. The model definition is [here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py).
- Initialize a Nanotron model with empty weights. The model definition is [here](https://github.com/huggingface/nanotron/blob/main/src/nanotron/models/llama.py).
- Copy the parameters layer by layer from one model to the other.
- Store the Nanotron model along with the tokenizer.

When comparing the HuggingFace implementation with the Nanotron implementation, the main difference lies in the Q, K & V matrices and in the MLP projections. In the HuggingFace implementation, these matrices are separated [[1]](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L415), [[2]](https://github.com/huggingface/transformers/blob/1518508467d96b3866fc4ebcb7a5b3a2e0df2aa4/src/transformers/models/llama/modeling_llama.py#L194), while in the Nanotron implementation, they are concatenated [[1b]](https://github.com/huggingface/nanotron/blob/b69690703a1c41b60cd706f92a80a3d23ebaf2d0/src/nanotron/models/llama.py#L310), [[2b]](https://github.com/huggingface/nanotron/blob/b69690703a1c41b60cd706f92a80a3d23ebaf2d0/src/nanotron/models/llama.py#L149). It is crucial to pay attention to these details to convert the models correctly.

To perform the conversion, we will need at least **1 GPU**, although the operations will be carried out on the **CPU**. We will convert the models with a parallel configuration of DP = PP = TP = 1, but it should be noted that the checkpoints generated by Nanotron are topology agnostic.
4 changes: 2 additions & 2 deletions tools/llama3/convert_hf_to_nanotron.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
torchrun --nproc-per-node 1 tools/llama3/convert_hf_to_nanotron.py --nanotron-checkpoint-path nanotron_checkpoints/NanotronLlama38B --pretrained-model-name-or-path meta-llama/Meta-Llama-3-8B-Instruct
torchrun --nproc-per-node 1 tools/llama3/convert_hf_to_nanotron.py --nanotron-checkpoint-path nanotron_checkpoints/Nanotron-Llama-3-8B --pretrained-model-name-or-path meta-llama/Meta-Llama-3-8B-Instruct
"""
import argparse
import json
Expand Down Expand Up @@ -238,7 +238,7 @@ def main(args):
# Store Config and Model Config files
with open(nanotron_checkpoint_path / "config.yaml", "w") as f:
config = Config(
general=GeneralArgs(project="conversion", run="Llama3-8B"),
general=GeneralArgs(project="Nanotron", run="Llama3"),
parallelism=parallel_config,
model=ModelArgs(
init_method=ExistingCheckpointInit(nanotron_checkpoint_path),
Expand Down
2 changes: 1 addition & 1 deletion tools/llama3/convert_nanotron_to_hf.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
torchrun --nproc-per-node 1 tools/llama3/convert_nanotron_to_hf.py --nanotron-checkpoint-path nanotron_checkpoints/NanotronLlama38B --hugging-face-checkpoint-path hf_checkpoints/ConvertedNanotronLlama38B
torchrun --nproc-per-node 1 tools/llama3/convert_nanotron_to_hf.py --nanotron-checkpoint-path nanotron_checkpoints/Nanotron-Llama-3-8B --hugging-face-checkpoint-path hf_checkpoints/Converted-Nanotron-Llama-3-8B
"""
import argparse
import os
Expand Down

0 comments on commit 3e169c5

Please sign in to comment.