How to continue a conversation with more images? #68

simonw · 2024-09-29T21:27:22Z

It's not clear to me from looking at the code if this library supports the following pattern:

prompt 1: IMAGE1 - describe this image

... first response

prompt 2: IMAGE2 - compare with this image

... second response

Is this something the library can or could do? I'm interested in being able to implement multi-step conversations where images might be attached to future messages.

Blaizzy · 2024-09-29T22:50:19Z

Not yet. It's one of the things want to add next.

My focus at the moment is on the trainer and new models (pixtral, llama and molmo)

Blaizzy · 2024-09-29T22:52:26Z

It would be awesome if you could implement this

I would be more than happy to help, review and merge the PR🚀

mark-lord · 2024-10-02T07:53:32Z

+1, would love to see this implemented

Blaizzy · 2024-10-11T17:03:48Z

I think this will be easier and faster to do after I release prompt caching.

That way you only are computing KV for the last message only.

Blaizzy · 2024-10-28T00:08:36Z

Hey guys,

I thought a about it and here is an example that you could use to build this use case.

I will work on a more robust example, showcase different models that support and add it as a chat CLI tool in the next release :)

The idea is to only add the image tag to the last use message in the messages/conversations list alongside the lastest image.

from mlx_vlm import load
import mlx.core as mx
from mlx_vlm.utils import generate_step, load_image
import time
model_mlx, processor = load("mlx-community/idefics2-8b-4bit")


# Image
url = "/path/to/your/image"
image = load_image(url)

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
        ],
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": """The image shows a colorful chameleon sitting on a vibrant flower. The chameleon has a blue body with vibrant green and red stripes, and its eyes are wide open, giving it a curious and alert expression. The flower has a mix of pink, yellow, and red petals, adding to the vividness of the scene."""}
        ]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Compare this image to the previous one."},
            {"type": "image"} # used on the last user message in the list
        ]
    }
]


# Preprocess the inputs
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

inputs = processor(
    text=[text_prompt], images=[image], padding=True, return_tensors="np"
)

pixel_values = mx.array(inputs['pixel_values'])
input_ids = mx.array(inputs['input_ids'])
mask = mx.array(inputs['attention_mask'])

max_tokens = 1000
verbose = False # Set to True to stream the output

# Get the prompt tokens and the tokenizer
prompt_tokens = mx.array(processor.tokenizer.encode(text_prompt))
tokenizer = processor.tokenizer

# Initialize timing and detokenizer
tic = time.perf_counter()
detokenizer = processor.detokenizer
detokenizer.reset()

# Generate tokens
generator = generate_step(
    input_ids,
    model_mlx,
    pixel_values,
    mask,
    temperature=0.7,
)

prompt_time = 0
for (token, prob), n in zip(generator, range(max_tokens)):

    if n == 0:
        prompt_time = time.perf_counter() - tic
        tic = time.perf_counter()

    if token == tokenizer.eos_token_id and n > 0:
        break

    detokenizer.add_token(token)

    if verbose:
        print(detokenizer.last_segment, end="", flush=True)

    token_count = n + 1

detokenizer.finalize()

if verbose:
    print(detokenizer.last_segment, flush=True)
    gen_time = time.perf_counter() - tic
    print("=" * 10)
    if token_count == 0:
        print("No tokens generated for this prompt")
    prompt_tps = prompt_tokens.size / prompt_time
    gen_tps = (token_count - 1) / gen_time

    print(f"Prompt: {prompt_tps:.3f} tokens-per-sec")
    print(f"Generation: {gen_tps:.3f} tokens-per-sec")

# Print the generated text
print(detokenizer.text)

Blaizzy · 2024-10-28T00:17:06Z

Example output:

simonw · 2024-10-29T02:00:52Z

Looks like there's new code for chat in this branch: https://github.com/Blaizzy/mlx-vlm/tree/pc/video - e.g. 810fb53

Blaizzy · 2024-10-29T08:38:48Z

Yes there is :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to continue a conversation with more images? #68

How to continue a conversation with more images? #68

simonw commented Sep 29, 2024

Blaizzy commented Sep 29, 2024

Blaizzy commented Sep 29, 2024

mark-lord commented Oct 2, 2024

Blaizzy commented Oct 11, 2024

Blaizzy commented Oct 28, 2024 •

edited

Loading

Blaizzy commented Oct 28, 2024

simonw commented Oct 29, 2024

Blaizzy commented Oct 29, 2024

How to continue a conversation with more images? #68

How to continue a conversation with more images? #68

Comments

simonw commented Sep 29, 2024

Blaizzy commented Sep 29, 2024

Blaizzy commented Sep 29, 2024

mark-lord commented Oct 2, 2024

Blaizzy commented Oct 11, 2024

Blaizzy commented Oct 28, 2024 • edited Loading

Blaizzy commented Oct 28, 2024

simonw commented Oct 29, 2024

Blaizzy commented Oct 29, 2024

Blaizzy commented Oct 28, 2024 •

edited

Loading