Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pixtral Multi-Image Bug #87

Closed
mattjcly opened this issue Oct 15, 2024 · 1 comment
Closed

Pixtral Multi-Image Bug #87

mattjcly opened this issue Oct 15, 2024 · 1 comment

Comments

@mattjcly
Copy link

From commit 0a39868

When attempting to send multiple images to Pixtral, I'm seeing the following error:

matt@Matts-MacBook-Pro [13:04:19] [~/Workspace/mlx-vlm] [main *]
-> % python -m mlx_vlm.generate --model /Users/matt/.cache/lm-studio/models/mlx-community/pixtral-12b-4bit --max-tokens 100 --prompt "Compare these images" --image /Users/matt/Downloads/whiteboard-image.jpg /Users/matt/Downloads/dog-smoking-cigar.png        
==========
Image: ['/Users/matt/Downloads/whiteboard-image.jpg', '/Users/matt/Downloads/dog-smoking-cigar.png'] 

Prompt: <s>[INST]Compare these images[IMG][IMG][/INST]
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/generate.py", line 96, in <module>
    main()
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/generate.py", line 81, in main
    output = generate(
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/utils.py", line 1021, in generate
    for (token, prob), n in zip(generator, range(max_tokens)):
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/utils.py", line 888, in generate_step
    logits = model(input_ids, pixel_values, cache=cache, mask=mask, **kwargs)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/models/pixtral/pixtral.py", line 147, in __call__
    input_embddings = self.get_input_embeddings(input_ids, pixel_values)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/models/pixtral/pixtral.py", line 84, in get_input_embeddings
    pixel_values = mx.concatenate(
ValueError: [concatenate] All the input array dimensions must match exactly except for the concatenation axis. However, the provided shapes are (3,368,544), (3,512,512), and the concatenation axis is 1.
(myenv) 
matt@Matts-MacBook-Pro [13:04:28] [~/Workspace/mlx-vlm] [main *]
-> % python -m mlx_vlm.generate --model /Users/matt/.cache/lm-studio/models/mlx-community/pixtral-12b-4bit --max-tokens 100 --prompt "Compare these images" --image /Users/matt/Downloads/whiteboard-image.jpg /Users/matt/Downloads/cat.jpeg  
==========
Image: ['/Users/matt/Downloads/whiteboard-image.jpg', '/Users/matt/Downloads/cat.jpeg'] 

Prompt: <s>[INST]Compare these images[IMG][IMG][/INST]
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/generate.py", line 96, in <module>
    main()
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/generate.py", line 81, in main
    output = generate(
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/utils.py", line 1021, in generate
    for (token, prob), n in zip(generator, range(max_tokens)):
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/utils.py", line 888, in generate_step
    logits = model(input_ids, pixel_values, cache=cache, mask=mask, **kwargs)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/models/pixtral/pixtral.py", line 147, in __call__
    input_embddings = self.get_input_embeddings(input_ids, pixel_values)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/models/pixtral/pixtral.py", line 84, in get_input_embeddings
    pixel_values = mx.concatenate(
ValueError: [concatenate] All the input array dimensions must match exactly except for the concatenation axis. However, the provided shapes are (3,368,544), (3,736,736), and the concatenation axis is 1.
@Blaizzy
Copy link
Owner

Blaizzy commented Oct 15, 2024

#83 closes it.

You can use --resize-shape to ensure both images have consistent resolution

python -m mlx_vlm.generate --model mlx-community/pixtral-12b-4bit --max-tokens 1000 --prompt "Compare these images" --image image1.jpg image2.jpeg --resize-shape 560 560

@Blaizzy Blaizzy closed this as completed Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants