-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Null Image input #94
Comments
Hey @dtoconnor Thanks, that's a nice suggestion I will add it to the next release |
Is no image allowed now? |
@Blaizzy Is this still feat still in the pipeline? |
It's actually done in #153 ✅ Just pass remove Every model should respond normal expect for Florence-2. |
Let me know how it goes so I can close this issue :) |
@Blaizzy I'm trying to implement text-only support in my application but getting errors. Here's my current implementation: def generate_with_model(
model: Union[VisionModelWrapper, Any],
processor: Any,
config: Optional[dict],
prompt: str,
images: Optional[List[str]] = None,
max_tokens: int = 100,
temperature: float = 0.0
) -> str:
"""Generate output with MLX model (VLM or LM)."""
try:
if isinstance(model, VisionModelWrapper):
# Check if it's Florence-2 model
if "Florence-2" in str(model.__class__):
if not images:
raise ValueError("Florence-2 model requires images.")
messages = [{"role": "user", "content": prompt}]
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(images) if images else 0
)
if images:
output = generate_vlm(
model=model,
processor=processor,
image=images[0],
prompt=formatted_prompt,
max_tokens=max_tokens,
temp=temperature,
verbose=True
)
else:
# Trying to use without image
output = generate_vlm(
model=model,
processor=processor,
image=[], # Empty list as suggested
prompt=formatted_prompt,
max_tokens=max_tokens,
temp=temperature,
verbose=True
)
return output.strip() Getting this error when trying without image:
The code works fine with images, but I can't get it to work text-only. Could you share how to properly implement this in the Python API? Thanks! |
Install the branch from #161 It has all of this figured out. I will merge it soon, later today or tomorrow. I just need to test most models are working fine. |
Please share the entire trace I think the problem is elsewhere |
@Blaizzy I'm getting this traceback If I run a test with my code
I tried implementing it like this in output = generate_vlm(
model=model,
processor=processor,
image=[], # Empty list as suggested
prompt=formatted_prompt,
max_tokens=max_tokens,
temp=temperature,
verbose=True
) But getting the array type error. |
Could you share a fully reproducible example? |
@Blaizzy Here's a fully reproducible example showing the issue: test_generate_with_model.py: import mlx.core as mx
from mlx_vlm import load, generate as generate_vlm
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from PIL import Image
from typing import Union, Any, Optional, List
from backend.mlx_vlm_manager import VisionModelWrapper, generate_with_model
def test_generate_with_model():
"""Test generate_with_model functie met en zonder afbeelding"""
# Load model
print("Loading model...")
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)
# Wrap model
model = VisionModelWrapper(model, processor)
# Test 1: Zonder afbeelding
print("\n=== Test without image ===")
prompt_no_image = "What is the capital of France?"
print("Generating without image...")
output = generate_with_model(
model=model,
processor=processor,
config=config,
prompt=prompt_no_image,
images=None,
max_tokens=100,
temperature=0.0
)
print(f"Output: {output}\n")
if __name__ == "__main__":
test_generate_with_model() backend/mlx_vlm_manager.py (relevant part): def generate_with_model(
model: Union[VisionModelWrapper, Any],
processor: Any,
config: Optional[dict],
prompt: str,
images: Optional[List[str]] = None,
max_tokens: int = 100,
temperature: float = 0.0
) -> str:
"""Generate output with MLX model (VLM or LM)."""
try:
if isinstance(model, VisionModelWrapper):
# Check if it's Florence-2 model
if "Florence-2" in str(model.__class__):
if not images:
raise ValueError("Florence-2 model requires images.")
messages = [{"role": "user", "content": prompt}]
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(images) if images else 0
)
if images:
output = generate_vlm(
model=model,
processor=processor,
image=images[0],
prompt=formatted_prompt,
max_tokens=max_tokens,
temp=temperature,
verbose=True
)
else:
# Trying to use without image
output = generate_vlm(
model=model,
processor=processor,
image=[], # Empty list as suggested
prompt=formatted_prompt,
max_tokens=max_tokens,
temp=temperature,
verbose=True
)
return output.strip() When running this test, I get:
|
Thanks! I will take a look But for now please use the refactor-utilise branch |
Thanks. But most of all thx for all your great work helping MLX grow! |
My pleasure! and thank you very much, it means a lot! The issue you are facing was fixed in the pc/refactor-utils branch yesterday in PR #161. import mlx.core as mx
from mlx_vlm import load, generate as generate_vlm
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
from PIL import Image
from typing import Union, Any, Optional, List
def test_generate_with_model():
"""Test generate_with_model functie met en zonder afbeelding"""
# Load model
print("Loading model...")
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)
images = []
# Test 1: Zonder afbeelding
print("\n=== Test without image ===")
prompt_no_image = "What is the capital of France?"
messages = [{"role": "user", "content": prompt_no_image}]
formatted_prompt = apply_chat_template(
processor, config, prompt_no_image, num_images=len(images) if images else 0
)
print("Generating without image...")
output = generate_vlm(
model=model,
processor=processor,
image=[], # Empty list as suggested
prompt=formatted_prompt,
max_tokens=100,
temp=0.5,
verbose=True
)
print(f"Output: {output}\n")
if __name__ == "__main__":
test_generate_with_model() === Test without image ===
Generating without image...
==========
Image: []
Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.
==========
Prompt: 298.426 tokens-per-sec
Generation: 127.602 tokens-per-sec
Output: The capital of France is Paris. |
Feature request: allow messages with no images to be sent to the VLM (specifically Qwen2-VL-7B-8bit). This is supported with transformers and other libraries by default, but null images cause shape errors upon inference in MLX_VLM, e.g.
ValueError: Invalid high padding size (-578) passed to pad for axis 1. Padding sizes must be non-negative
Apart from that, I love the work you are doing.
The text was updated successfully, but these errors were encountered: