Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Null Image input #94

Closed
dtoconnor opened this issue Oct 18, 2024 · 14 comments · Fixed by #161
Closed

Support for Null Image input #94

dtoconnor opened this issue Oct 18, 2024 · 14 comments · Fixed by #161

Comments

@dtoconnor
Copy link

Feature request: allow messages with no images to be sent to the VLM (specifically Qwen2-VL-7B-8bit). This is supported with transformers and other libraries by default, but null images cause shape errors upon inference in MLX_VLM, e.g.

ValueError: Invalid high padding size (-578) passed to pad for axis 1. Padding sizes must be non-negative

Apart from that, I love the work you are doing.

@Blaizzy
Copy link
Owner

Blaizzy commented Oct 19, 2024

Hey @dtoconnor

Thanks, that's a nice suggestion

I will add it to the next release

@qinxuye
Copy link

qinxuye commented Dec 9, 2024

Is no image allowed now?

@CharafChnioune
Copy link

@Blaizzy Is this still feat still in the pipeline?

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 28, 2024

It's actually done in #153

Just pass remove --image arg via CLI or pass empty list programmatically and the model should work just like a normal language model.

Every model should respond normal expect for Florence-2.

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 28, 2024

Let me know how it goes so I can close this issue :)

@CharafChnioune
Copy link

@Blaizzy I'm trying to implement text-only support in my application but getting errors. Here's my current implementation:

def generate_with_model(
    model: Union[VisionModelWrapper, Any],
    processor: Any,
    config: Optional[dict],
    prompt: str,
    images: Optional[List[str]] = None,
    max_tokens: int = 100,
    temperature: float = 0.0
) -> str:
    """Generate output with MLX model (VLM or LM)."""
    try:
        if isinstance(model, VisionModelWrapper):
            # Check if it's Florence-2 model
            if "Florence-2" in str(model.__class__):
                if not images:
                    raise ValueError("Florence-2 model requires images.")
            
            messages = [{"role": "user", "content": prompt}]
            formatted_prompt = apply_chat_template(
                processor, config, prompt, num_images=len(images) if images else 0
            )
            
            if images:
                output = generate_vlm(
                    model=model,
                    processor=processor,
                    image=images[0],
                    prompt=formatted_prompt,
                    max_tokens=max_tokens,
                    temp=temperature,
                    verbose=True
                )
            else:
                # Trying to use without image
                output = generate_vlm(
                    model=model,
                    processor=processor,
                    image=[],  # Empty list as suggested
                    prompt=formatted_prompt,
                    max_tokens=max_tokens,
                    temp=temperature,
                    verbose=True
                )
            return output.strip()

Getting this error when trying without image:

TypeError: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType

The code works fine with images, but I can't get it to work text-only. Could you share how to properly implement this in the Python API? Thanks!

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 29, 2024

Install the branch from #161

It has all of this figured out.

I will merge it soon, later today or tomorrow.

I just need to test most models are working fine.

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 29, 2024

Please share the entire trace I think the problem is elsewhere

@CharafChnioune
Copy link

@Blaizzy I'm getting this traceback If I run a test with my code

Loading model...
Fetching 11 files: 100%|████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 183084.70it/s]
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 40435.88it/s]

=== Test without image ===
Generating without image...
==========
Image: [] 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant

Error generating output: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType
Output: Error: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType

I tried implementing it like this in generate_with_model:

output = generate_vlm(
    model=model,
    processor=processor,
    image=[],  # Empty list as suggested
    prompt=formatted_prompt,
    max_tokens=max_tokens,
    temp=temperature,
    verbose=True
)

But getting the array type error.

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 29, 2024

Could you share a fully reproducible example?

@CharafChnioune
Copy link

@Blaizzy Here's a fully reproducible example showing the issue:

test_generate_with_model.py:

import mlx.core as mx
from mlx_vlm import load, generate as generate_vlm
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from PIL import Image
from typing import Union, Any, Optional, List
from backend.mlx_vlm_manager import VisionModelWrapper, generate_with_model

def test_generate_with_model():
    """Test generate_with_model functie met en zonder afbeelding"""
    
    # Load model
    print("Loading model...")
    model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
    model, processor = load(model_path)
    config = load_config(model_path)
    
    # Wrap model
    model = VisionModelWrapper(model, processor)
    
    # Test 1: Zonder afbeelding
    print("\n=== Test without image ===")
    prompt_no_image = "What is the capital of France?"
    
    print("Generating without image...")
    output = generate_with_model(
        model=model,
        processor=processor,
        config=config,
        prompt=prompt_no_image,
        images=None,
        max_tokens=100,
        temperature=0.0
    )
    print(f"Output: {output}\n")

if __name__ == "__main__":
    test_generate_with_model()

backend/mlx_vlm_manager.py (relevant part):

def generate_with_model(
    model: Union[VisionModelWrapper, Any],
    processor: Any,
    config: Optional[dict],
    prompt: str,
    images: Optional[List[str]] = None,
    max_tokens: int = 100,
    temperature: float = 0.0
) -> str:
    """Generate output with MLX model (VLM or LM)."""
    try:
        if isinstance(model, VisionModelWrapper):
            # Check if it's Florence-2 model
            if "Florence-2" in str(model.__class__):
                if not images:
                    raise ValueError("Florence-2 model requires images.")
            
            messages = [{"role": "user", "content": prompt}]
            formatted_prompt = apply_chat_template(
                processor, config, prompt, num_images=len(images) if images else 0
            )
            
            if images:
                output = generate_vlm(
                    model=model,
                    processor=processor,
                    image=images[0],
                    prompt=formatted_prompt,
                    max_tokens=max_tokens,
                    temp=temperature,
                    verbose=True
                )
            else:
                # Trying to use without image
                output = generate_vlm(
                    model=model,
                    processor=processor,
                    image=[],  # Empty list as suggested
                    prompt=formatted_prompt,
                    max_tokens=max_tokens,
                    temp=temperature,
                    verbose=True
                )
            return output.strip()

When running this test, I get:

Loading model...
Fetching 11 files: 100%|████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 183084.70it/s]
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 40435.88it/s]

=== Test without image ===
Generating without image...
==========
Image: [] 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant

Error generating output: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType
Output: Error: __init__(): incompatible function arguments. The following argument types are supported:
    1. __init__(self: array, val: Union[scalar, list, tuple, numpy.ndarray, array], dtype: Optional[Dtype] = None)

Invoked with types: mlx.core.array, NoneType

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 29, 2024

Thanks!

I will take a look

But for now please use the refactor-utilise branch

@CharafChnioune
Copy link

Thanks.

But most of all thx for all your great work helping MLX grow!

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 29, 2024

My pleasure!

and thank you very much, it means a lot!

The issue you are facing was fixed in the pc/refactor-utils branch yesterday in PR #161.

import mlx.core as mx
from mlx_vlm import load, generate as generate_vlm
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from PIL import Image
from typing import Union, Any, Optional, List

def test_generate_with_model():
    """Test generate_with_model functie met en zonder afbeelding"""

    # Load model
    print("Loading model...")
    model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
    model, processor = load(model_path)
    config = load_config(model_path)

    images = []

    # Test 1: Zonder afbeelding
    print("\n=== Test without image ===")
    prompt_no_image = "What is the capital of France?"
    messages = [{"role": "user", "content": prompt_no_image}]
    formatted_prompt = apply_chat_template(
        processor, config, prompt_no_image, num_images=len(images) if images else 0
    )

    print("Generating without image...")
    output = generate_vlm(
        model=model,
        processor=processor,
        image=[],  # Empty list as suggested
        prompt=formatted_prompt,
        max_tokens=100,
        temp=0.5,
        verbose=True
    )
    print(f"Output: {output}\n")

if __name__ == "__main__":
    test_generate_with_model()
=== Test without image ===
Generating without image...
==========
Image: [] 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant

The capital of France is Paris.
==========
Prompt: 298.426 tokens-per-sec
Generation: 127.602 tokens-per-sec
Output: The capital of France is Paris.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants