Pixtral Multi-Image Bug #87

mattjcly · 2024-10-15T17:08:24Z

From commit 0a39868

When attempting to send multiple images to Pixtral, I'm seeing the following error:

matt@Matts-MacBook-Pro [13:04:19] [~/Workspace/mlx-vlm] [main *]
-> % python -m mlx_vlm.generate --model /Users/matt/.cache/lm-studio/models/mlx-community/pixtral-12b-4bit --max-tokens 100 --prompt "Compare these images" --image /Users/matt/Downloads/whiteboard-image.jpg /Users/matt/Downloads/dog-smoking-cigar.png        
==========
Image: ['/Users/matt/Downloads/whiteboard-image.jpg', '/Users/matt/Downloads/dog-smoking-cigar.png'] 

Prompt: <s>[INST]Compare these images[IMG][IMG][/INST]
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/generate.py", line 96, in <module>
    main()
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/generate.py", line 81, in main
    output = generate(
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/utils.py", line 1021, in generate
    for (token, prob), n in zip(generator, range(max_tokens)):
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/utils.py", line 888, in generate_step
    logits = model(input_ids, pixel_values, cache=cache, mask=mask, **kwargs)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/models/pixtral/pixtral.py", line 147, in __call__
    input_embddings = self.get_input_embeddings(input_ids, pixel_values)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/models/pixtral/pixtral.py", line 84, in get_input_embeddings
    pixel_values = mx.concatenate(
ValueError: [concatenate] All the input array dimensions must match exactly except for the concatenation axis. However, the provided shapes are (3,368,544), (3,512,512), and the concatenation axis is 1.
(myenv) 
matt@Matts-MacBook-Pro [13:04:28] [~/Workspace/mlx-vlm] [main *]
-> % python -m mlx_vlm.generate --model /Users/matt/.cache/lm-studio/models/mlx-community/pixtral-12b-4bit --max-tokens 100 --prompt "Compare these images" --image /Users/matt/Downloads/whiteboard-image.jpg /Users/matt/Downloads/cat.jpeg  
==========
Image: ['/Users/matt/Downloads/whiteboard-image.jpg', '/Users/matt/Downloads/cat.jpeg'] 

Prompt: <s>[INST]Compare these images[IMG][IMG][/INST]
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/generate.py", line 96, in <module>
    main()
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/generate.py", line 81, in main
    output = generate(
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/utils.py", line 1021, in generate
    for (token, prob), n in zip(generator, range(max_tokens)):
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/utils.py", line 888, in generate_step
    logits = model(input_ids, pixel_values, cache=cache, mask=mask, **kwargs)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/models/pixtral/pixtral.py", line 147, in __call__
    input_embddings = self.get_input_embeddings(input_ids, pixel_values)
  File "/Users/matt/Workspace/mlx-vlm/mlx_vlm/models/pixtral/pixtral.py", line 84, in get_input_embeddings
    pixel_values = mx.concatenate(
ValueError: [concatenate] All the input array dimensions must match exactly except for the concatenation axis. However, the provided shapes are (3,368,544), (3,736,736), and the concatenation axis is 1.

The text was updated successfully, but these errors were encountered:

Blaizzy · 2024-10-15T19:02:40Z

#83 closes it.

You can use --resize-shape to ensure both images have consistent resolution

python -m mlx_vlm.generate --model mlx-community/pixtral-12b-4bit --max-tokens 1000 --prompt "Compare these images" --image image1.jpg image2.jpeg --resize-shape 560 560

Blaizzy closed this as completed Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pixtral Multi-Image Bug #87

Pixtral Multi-Image Bug #87

mattjcly commented Oct 15, 2024

Blaizzy commented Oct 15, 2024

Pixtral Multi-Image Bug #87

Pixtral Multi-Image Bug #87

Comments

mattjcly commented Oct 15, 2024

Blaizzy commented Oct 15, 2024