Support for Null Image input #94

dtoconnor · 2024-10-18T21:42:11Z

Feature request: allow messages with no images to be sent to the VLM (specifically Qwen2-VL-7B-8bit). This is supported with transformers and other libraries by default, but null images cause shape errors upon inference in MLX_VLM, e.g.

ValueError: Invalid high padding size (-578) passed to pad for axis 1. Padding sizes must be non-negative

Apart from that, I love the work you are doing.

The text was updated successfully, but these errors were encountered:

Blaizzy · 2024-10-19T20:17:46Z

Hey @dtoconnor

Thanks, that's a nice suggestion

I will add it to the next release

qinxuye · 2024-12-09T12:38:20Z

Is no image allowed now?

CharafChnioune · 2024-12-28T18:40:25Z

@Blaizzy Is this still feat still in the pipeline?

Blaizzy · 2024-12-28T19:20:28Z

It's actually done in #153 ✅

Just pass remove --image arg via CLI or pass empty list programmatically and the model should work just like a normal language model.

Every model should respond normal expect for Florence-2.

Blaizzy · 2024-12-28T19:20:50Z

Let me know how it goes so I can close this issue :)

CharafChnioune · 2024-12-29T17:13:08Z

@Blaizzy I'm trying to implement text-only support in my application but getting errors. Here's my current implementation:

def generate_with_model(
    model: Union[VisionModelWrapper, Any],
    processor: Any,
    config: Optional[dict],
    prompt: str,
    images: Optional[List[str]] = None,
    max_tokens: int = 100,
    temperature: float = 0.0
) -> str:
    """Generate output with MLX model (VLM or LM)."""
    try:
        if isinstance(model, VisionModelWrapper):
            # Check if it's Florence-2 model
            if "Florence-2" in str(model.__class__):
                if not images:
                    raise ValueError("Florence-2 model requires images.")
            
            messages = [{"role": "user", "content": prompt}]
            formatted_prompt = apply_chat_template(
                processor, config, prompt, num_images=len(images) if images else 0
            )
            
            if images:
                output = generate_vlm(
                    model=model,
                    processor=processor,
                    image=images[0],
                    prompt=formatted_prompt,
                    max_tokens=max_tokens,
                    temp=temperature,
                    verbose=True
                )
            else:
                # Trying to use without image
                output = generate_vlm(
                    model=model,
                    processor=processor,
                    image=[],  # Empty list as suggested
                    prompt=formatted_prompt,
                    max_tokens=max_tokens,
                    temp=temperature,
                    verbose=True
                )
            return output.strip()

Getting this error when trying without image:

TypeError: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType

The code works fine with images, but I can't get it to work text-only. Could you share how to properly implement this in the Python API? Thanks!

Blaizzy · 2024-12-29T18:03:05Z

Install the branch from #161

It has all of this figured out.

I will merge it soon, later today or tomorrow.

I just need to test most models are working fine.

Blaizzy · 2024-12-29T18:04:35Z

Please share the entire trace I think the problem is elsewhere

CharafChnioune · 2024-12-29T18:25:37Z

@Blaizzy I'm getting this traceback If I run a test with my code

Loading model...
Fetching 11 files: 100%|████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 183084.70it/s]
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 40435.88it/s]

=== Test without image ===
Generating without image...
==========
Image: [] 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant

Error generating output: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType
Output: Error: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType

I tried implementing it like this in generate_with_model:

output = generate_vlm(
    model=model,
    processor=processor,
    image=[],  # Empty list as suggested
    prompt=formatted_prompt,
    max_tokens=max_tokens,
    temp=temperature,
    verbose=True
)

But getting the array type error.

Blaizzy · 2024-12-29T18:44:35Z

Could you share a fully reproducible example?

CharafChnioune · 2024-12-29T18:48:36Z

@Blaizzy Here's a fully reproducible example showing the issue:

test_generate_with_model.py:

import mlx.core as mx
from mlx_vlm import load, generate as generate_vlm
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from PIL import Image
from typing import Union, Any, Optional, List
from backend.mlx_vlm_manager import VisionModelWrapper, generate_with_model

def test_generate_with_model():
    """Test generate_with_model functie met en zonder afbeelding"""
    
    # Load model
    print("Loading model...")
    model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
    model, processor = load(model_path)
    config = load_config(model_path)
    
    # Wrap model
    model = VisionModelWrapper(model, processor)
    
    # Test 1: Zonder afbeelding
    print("\n=== Test without image ===")
    prompt_no_image = "What is the capital of France?"
    
    print("Generating without image...")
    output = generate_with_model(
        model=model,
        processor=processor,
        config=config,
        prompt=prompt_no_image,
        images=None,
        max_tokens=100,
        temperature=0.0
    )
    print(f"Output: {output}\n")

if __name__ == "__main__":
    test_generate_with_model()

backend/mlx_vlm_manager.py (relevant part):

def generate_with_model(
    model: Union[VisionModelWrapper, Any],
    processor: Any,
    config: Optional[dict],
    prompt: str,
    images: Optional[List[str]] = None,
    max_tokens: int = 100,
    temperature: float = 0.0
) -> str:
    """Generate output with MLX model (VLM or LM)."""
    try:
        if isinstance(model, VisionModelWrapper):
            # Check if it's Florence-2 model
            if "Florence-2" in str(model.__class__):
                if not images:
                    raise ValueError("Florence-2 model requires images.")
            
            messages = [{"role": "user", "content": prompt}]
            formatted_prompt = apply_chat_template(
                processor, config, prompt, num_images=len(images) if images else 0
            )
            
            if images:
                output = generate_vlm(
                    model=model,
                    processor=processor,
                    image=images[0],
                    prompt=formatted_prompt,
                    max_tokens=max_tokens,
                    temp=temperature,
                    verbose=True
                )
            else:
                # Trying to use without image
                output = generate_vlm(
                    model=model,
                    processor=processor,
                    image=[],  # Empty list as suggested
                    prompt=formatted_prompt,
                    max_tokens=max_tokens,
                    temp=temperature,
                    verbose=True
                )
            return output.strip()

When running this test, I get:

Loading model...
Fetching 11 files: 100%|████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 183084.70it/s]
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 40435.88it/s]

=== Test without image ===
Generating without image...
==========
Image: [] 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant

Error generating output: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType
Output: Error: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType

Blaizzy · 2024-12-29T18:51:58Z

Thanks!

I will take a look

But for now please use the refactor-utilise branch

CharafChnioune · 2024-12-29T18:58:05Z

Thanks.

But most of all thx for all your great work helping MLX grow!

Blaizzy · 2024-12-29T22:16:00Z

My pleasure!

and thank you very much, it means a lot!

The issue you are facing was fixed in the pc/refactor-utils branch yesterday in PR #161.

import mlx.core as mx
from mlx_vlm import load, generate as generate_vlm
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from PIL import Image
from typing import Union, Any, Optional, List

def test_generate_with_model():
    """Test generate_with_model functie met en zonder afbeelding"""

    # Load model
    print("Loading model...")
    model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
    model, processor = load(model_path)
    config = load_config(model_path)

    images = []

    # Test 1: Zonder afbeelding
    print("\n=== Test without image ===")
    prompt_no_image = "What is the capital of France?"
    messages = [{"role": "user", "content": prompt_no_image}]
    formatted_prompt = apply_chat_template(
        processor, config, prompt_no_image, num_images=len(images) if images else 0
    )

    print("Generating without image...")
    output = generate_vlm(
        model=model,
        processor=processor,
        image=[],  # Empty list as suggested
        prompt=formatted_prompt,
        max_tokens=100,
        temp=0.5,
        verbose=True
    )
    print(f"Output: {output}\n")

if __name__ == "__main__":
    test_generate_with_model()

=== Test without image ===
Generating without image...
==========
Image: [] 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant

The capital of France is Paris.
==========
Prompt: 298.426 tokens-per-sec
Generation: 127.602 tokens-per-sec
Output: The capital of France is Paris.

Blaizzy mentioned this issue Dec 30, 2024

Refactor utils #1 #161

Merged

Blaizzy closed this as completed in #161 Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Null Image input #94

Support for Null Image input #94

dtoconnor commented Oct 18, 2024

Blaizzy commented Oct 19, 2024

qinxuye commented Dec 9, 2024

CharafChnioune commented Dec 28, 2024

Blaizzy commented Dec 28, 2024 •

edited

Loading

Blaizzy commented Dec 28, 2024

CharafChnioune commented Dec 29, 2024

Blaizzy commented Dec 29, 2024

Blaizzy commented Dec 29, 2024

CharafChnioune commented Dec 29, 2024

Blaizzy commented Dec 29, 2024

CharafChnioune commented Dec 29, 2024

Blaizzy commented Dec 29, 2024

CharafChnioune commented Dec 29, 2024

Blaizzy commented Dec 29, 2024 •

edited

Loading

Support for Null Image input #94

Support for Null Image input #94

Comments

dtoconnor commented Oct 18, 2024

Blaizzy commented Oct 19, 2024

qinxuye commented Dec 9, 2024

CharafChnioune commented Dec 28, 2024

Blaizzy commented Dec 28, 2024 • edited Loading

Blaizzy commented Dec 28, 2024

CharafChnioune commented Dec 29, 2024

Blaizzy commented Dec 29, 2024

Blaizzy commented Dec 29, 2024

CharafChnioune commented Dec 29, 2024

Blaizzy commented Dec 29, 2024

CharafChnioune commented Dec 29, 2024

Blaizzy commented Dec 29, 2024

CharafChnioune commented Dec 29, 2024

Blaizzy commented Dec 29, 2024 • edited Loading

Blaizzy commented Dec 28, 2024 •

edited

Loading

Blaizzy commented Dec 29, 2024 •

edited

Loading