Add support for Llama-3.2-vision & Resize image #83

Blaizzy · 2024-10-12T14:35:47Z

Adds support for Llama Vision, streamlined resize image and formated language model output.

Test multi-image
Add to trainer

Closes #60

Blaizzy

LGTM!

jrp2014 · 2024-10-19T16:56:27Z

Some other models work with the example python code, but this model seems to need another parameter:

meta-llama/Llama-3.2-11B-Vision
Fetching 15 files: 100%|█████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 36578.23it/s]
Fetching 15 files: 100%|█████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 15076.58it/s]
/Users/xxx/Pictures/Processed/20241012-153359_L1010053.jpg
Traceback (most recent call last):
  File "/Users/xxx/Documents/AI/mlx/scripts/vlm/mytest.py", line 33, in <module>
    formatted_prompt = apply_chat_template(
                       ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/prompt_utils.py", line 140, in apply_chat_template
    return processor.apply_chat_template(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/processing_utils.py", line 1067, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.

jrp2014 · 2024-10-19T17:59:07Z

But the -Instruct version of the model works! It's very much slower than some of the other models on a 48Gb M3 Max.

add resize to generate and stream

2f90696

Blaizzy mentioned this pull request Oct 12, 2024

Qwen2-VL-7B-Instruct-4bit allocation crashes on larger dimension images #79

Closed

Blaizzy added 11 commits October 13, 2024 01:29

add llama-3.2-vision

7e77301

doesn't support multi-image

cf26ab8

add LM output abstraction

4d6a5d3

Merge branch 'main' into pc/llama3.2-vision

6ca43f3

add tests

badb06a

formatting

1642689

update lora docs

b19b79b

use nn.rmsnorm (fix nan loss)

31d8a68

add image resize

627fe1c

add img resize

39985e9

add loading lora adapter config

6f18c9a

Blaizzy mentioned this pull request Oct 15, 2024

Pixtral Multi-Image Bug #87

Closed

Blaizzy added 2 commits October 16, 2024 10:31

Merge branch 'main' into pc/llama3.2-vision

71099f6

add mllama to the supported list

fee5844

Blaizzy commented Oct 16, 2024

View reviewed changes

Blaizzy self-assigned this Oct 16, 2024

Blaizzy added the enhancement New feature or request label Oct 16, 2024

Blaizzy added 3 commits October 17, 2024 23:09

removing np code

b6e6d51

Merge branch 'main' into pc/llama3.2-vision

46a1a02

remove numpy

847b6f7

Blaizzy merged commit 9040235 into main Oct 17, 2024
1 check passed

Blaizzy deleted the pc/llama3.2-vision branch October 25, 2024 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Llama-3.2-vision & Resize image #83

Add support for Llama-3.2-vision & Resize image #83

Blaizzy commented Oct 12, 2024 •

edited

Loading

Blaizzy left a comment

jrp2014 commented Oct 19, 2024

jrp2014 commented Oct 19, 2024

Add support for Llama-3.2-vision & Resize image #83

Add support for Llama-3.2-vision & Resize image #83

Conversation

Blaizzy commented Oct 12, 2024 • edited Loading

Blaizzy left a comment

Choose a reason for hiding this comment

jrp2014 commented Oct 19, 2024

jrp2014 commented Oct 19, 2024

Blaizzy commented Oct 12, 2024 •

edited

Loading