Skip to content

Commit

Permalink
Update Llama README.md for Stories110M tokenizer (#5960)
Browse files Browse the repository at this point in the history
Summary:
The tokenizer from `wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"` is TikToken, so we do not need to generate a `tokenizer.bin` and instead can just use the `tokenizer.model` as is.

Pull Request resolved: #5960

Reviewed By: tarun292

Differential Revision: D64014160

Pulled By: dvorjackz

fbshipit-source-id: 16474a73ed77192f58a5bb9e07426ba58216351e
  • Loading branch information
dvorjackz authored and facebook-github-bot committed Oct 8, 2024
1 parent 7337f8e commit 12cb9ca
Showing 1 changed file with 2 additions and 8 deletions.
10 changes: 2 additions & 8 deletions examples/models/llama2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,11 +205,6 @@ If you want to deploy and run a smaller model for educational purposes. From `ex
```
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
```
4. Create tokenizer.bin.
```
python -m extension.llm.tokenizer.tokenizer -t <tokenizer.model> -o tokenizer.bin
```
### Option D: Download and export Llama 2 7B model
Expand All @@ -224,7 +219,6 @@ You can export and run the original Llama 2 7B model.
python -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32
```
4. Create tokenizer.bin.
```
python -m extension.llm.tokenizer.tokenizer -t <tokenizer.model> -o tokenizer.bin
```
Expand Down Expand Up @@ -286,7 +280,7 @@ tokenizer.path=<path_to_checkpoint_folder>/tokenizer.model
Using the same arguments from above
```
python -m examples.models.llama2.eval_llama -c <checkpoint.pth> -p <params.json> -t <tokenizer.model> -d fp32 --max_seq_len <max sequence length> --limit <number of samples>
python -m examples.models.llama2.eval_llama -c <checkpoint.pth> -p <params.json> -t <tokenizer.model/bin> -d fp32 --max_seq_len <max sequence length> --limit <number of samples>
```

The Wikitext results generated above used: `{max_seq_len: 2048, limit: 1000}`
Expand Down Expand Up @@ -332,7 +326,7 @@ Note for Mac users: There's a known linking issue with Xcode 15.1. Refer to the
cmake-out/examples/models/llama2/llama_main --model_path=<model pte file> --tokenizer_path=<tokenizer.model> --prompt=<prompt>
```
For Llama2 and stories models, pass the converted `tokenizer.bin` file instead of `tokenizer.model`.
For Llama2 models, pass the converted `tokenizer.bin` file instead of `tokenizer.model`.
To build for CoreML backend and validate on Mac, replace `-DEXECUTORCH_BUILD_XNNPACK=ON` with `-DEXECUTORCH_BUILD_COREML=ON`
Expand Down

0 comments on commit 12cb9ca

Please sign in to comment.