Refactor utils #1 #161

Blaizzy · 2024-12-26T22:16:44Z

This PR is the first of many to clean up the code base and simplify it.

It will make it easier to add new features such as image/video feature caching, KV cache quant and more.

⚠️ This is a significant change, please feel free to open any issues if this PR breaks your workflow or model inference.

Closes #160
Closes #94
Closes #135
Closes #144

jrp2014 · 2024-12-30T10:16:26Z

Running my harness, on your latest release:

# from mlx import version as mlx_version
from mlx_vlm import load, generate, version as vlm_version
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
import subprocess
import time
import psutil

# print("mlx version:", mlx_version())
print("mlx-vlm version:", vlm_version)

output = subprocess.check_output(
    ["/opt/homebrew/Caskroom/miniconda/base/envs/mlx/bin/huggingface-cli", "scan-cache"]
)
lines = output.decode("utf-8").split("\n")[2:-4]

for line in lines:
    print(80 * "v")
    model_path = line.split()[0]
    print("\033[1mRunning", model_path, "\033[0m")

    process = psutil.Process()
    mem_before = process.memory_info().rss

    try:
        # Load the model
        model, tokenizer = load(model_path)
        config = load_config(model_path)
    except Exception as e:
        print(f"Failed to load model at {model_path}: {e}")
        continue

    # Prepare input
    image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
    prompt = "Describe this image."

    # Apply chat template
    formatted_prompt = apply_chat_template(
        tokenizer, config, prompt, num_images=len(image)
    )

    # Generate output
    try:
        start_time = time.time()
        output = generate(model, tokenizer, image, formatted_prompt, max_tokens=500, verbose=True)
        end_time = time.time()
        print(output)
    except Exception as e:
        print(f"Failed to generate output for model at {model_path}: {e}")
        continue

    mem_after = process.memory_info().rss
    print(f"Output generated in {end_time - start_time:.2f}s")
    print(f"Memory used: {(mem_after - mem_before) / (1024 * 1024 * 1024):.2f} GB")

    print(80 * "^", end="\n\n")

I get:

python check_models.py
mlx-vlm version: <module 'mlx_vlm.version' from '/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/version.py'>
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running HuggingFaceTB/SmolVLM-Instruct 
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 26324.08it/s]
Failed to load model at HuggingFaceTB/SmolVLM-Instruct: Unsupported model type: idefics3_vision
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running OpenGVLab/InternVL2_5-8B 
Fetching 21 files: 100%|█████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 13367.79it/s]
The repository for /Users/jrp/.cache/huggingface/hub/models--OpenGVLab--InternVL2_5-8B/snapshots/d64b85a1392275381ddbb7525db05e587303d59e contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//Users/jrp/.cache/huggingface/hub/models--OpenGVLab--InternVL2_5-8B/snapshots/d64b85a1392275381ddbb7525db05e587303d59e.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] ERROR:root:Model type internvl_chat not supported.
Failed to load model at OpenGVLab/InternVL2_5-8B: Model type internvl_chat not supported.
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running cognitivecomputations/dolphin-2.9.2-qwen2-72b 
Fetching 40 files: 100%|██████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 5521.73it/s]
ERROR:root:Model type qwen2 not supported.
Failed to load model at cognitivecomputations/dolphin-2.9.2-qwen2-72b: Model type qwen2 not supported.
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running distilbert/distilbert-base-uncased-finetuned-sst-2-english 
Fetching 10 files: 100%|██████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 5794.04it/s]
ERROR:root:Model type distilbert not supported.
Failed to load model at distilbert/distilbert-base-uncased-finetuned-sst-2-english: Model type distilbert not supported.
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running google/siglip-so400m-patch14-384 
Fetching 6 files: 100%|████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 13632.62it/s]
ERROR:root:Model type siglip not supported.
Failed to load model at google/siglip-so400m-patch14-384: Model type siglip not supported.
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running meta-llama/Llama-3.2-11B-Vision-Instruct 
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 16008.79it/s]
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 22090.79it/s]
==========
Image: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Describe this image.<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>

 

Prompt: ['http://images.cocodataset.org/val2017/000000039769.jpg']
Failed to generate output for model at meta-llama/Llama-3.2-11B-Vision-Instruct: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running microsoft/Phi-3.5-mini-instruct 
Fetching 13 files: 100%|██████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 2792.62it/s]
ERROR:root:Model type phi3 not supported.
Failed to load model at microsoft/Phi-3.5-mini-instruct: Model type phi3 not supported.
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running microsoft/Phi-3.5-vision-instruct 
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 16293.08it/s]
The repository for /Users/jrp/.cache/huggingface/hub/models--microsoft--Phi-3.5-vision-instruct/snapshots/4a0d683eba9f1d0cbfb6151705d1ee73c25a80ca contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//Users/jrp/.cache/huggingface/hub/models--microsoft--Phi-3.5-vision-instruct/snapshots/4a0d683eba9f1d0cbfb6151705d1ee73c25a80ca.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
The repository for /Users/jrp/.cache/huggingface/hub/models--microsoft--Phi-3.5-vision-instruct/snapshots/4a0d683eba9f1d0cbfb6151705d1ee73c25a80ca contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//Users/jrp/.cache/huggingface/hub/models--microsoft--Phi-3.5-vision-instruct/snapshots/4a0d683eba9f1d0cbfb6151705d1ee73c25a80ca.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:524: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
  warnings.warn(
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 11255.56it/s]
The repository for /Users/jrp/.cache/huggingface/hub/models--microsoft--Phi-3.5-vision-instruct/snapshots/4a0d683eba9f1d0cbfb6151705d1ee73c25a80ca contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//Users/jrp/.cache/huggingface/hub/models--microsoft--Phi-3.5-vision-instruct/snapshots/4a0d683eba9f1d0cbfb6151705d1ee73c25a80ca.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
==========
Image: <|user|>
<|image_1|>Describe this image.<|end|>
<|assistant|>
 

Prompt: ['http://images.cocodataset.org/val2017/000000039769.jpg']
Failed to generate output for model at microsoft/Phi-3.5-vision-instruct: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

I think that you default trust_remote_code to True (which I think should be more cautious, but it doesn't seem to prevent the prompts).

Biut more fundamentally, TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] breaks models that previously worked.

sachinraja13 · 2024-12-30T10:50:29Z

Facing the same problem above with 0.1.7

Blaizzy · 2024-12-30T12:38:27Z

Hey @jrp2014 and @sachinraja13

Here are the changes that you need to make to your script:

load_model and load_config now take kwargs that need to include trusted_remote_code just like in transformers. This is because you might have other configurations to set.
generate, stream_generate and generate step arguments are slightly changed, please check it here.

Script ✅

# from mlx import version as mlx_version
from mlx_vlm import load, generate, __version__ as vlm_version
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
import mlx.core as mx
import subprocess
import time
import psutil

# print("mlx version:", mlx_version())
print("mlx-vlm version:", vlm_version)

for model_path in [
    "mlx-community/nanoLLaVA-1.5-4bit",
    "mlx-community/Phi-3.5-vision-instruct-4bit",
    "mlx-community/Qwen2-VL-2B-Instruct-4bit",
    "HuggingFaceTB/SmolVLM-Instruct",
    "mlx-community/Llama-3.2-11B-Vision-Instruct-4bit",
    "mlx-community/idefics2-8b-4bit"
]:
    print(80 * "v")
    print("\033[1mRunning", model_path, "\033[0m")

    process = psutil.Process()
    mem_before = process.memory_info().rss

    try:
        # Load the model
        trust_remote_code = True
        model, tokenizer = load(model_path, trust_remote_code=trust_remote_code)
        config = load_config(model_path, trust_remote_code=trust_remote_code)
    except Exception as e:
        print(f"Failed to load model at {model_path}: {e}")
        continue

    # Prepare input
    image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
    prompt = "Describe this image."

    # Apply chat template
    formatted_prompt = apply_chat_template(
        tokenizer, config, prompt, num_images=len(image)
    )

    # Generate output
    try:
        start_time = time.time()
        output = generate(model, tokenizer, formatted_prompt, image, verbose=True, max_tokens=500)
        end_time = time.time()
        print(output)
    except Exception as e:
        print(f"Failed to generate output for model at {model_path}: {e}")
        continue

    mem_after = process.memory_info().rss
    print(f"Output generated in {end_time - start_time:.2f}s")
    print(f"Memory used: {(mem_after - mem_before) / (1024 * 1024 * 1024):.2f} GB")

    print(80 * "^", end="\n\n")
    del model, tokenizer
    mx.metal.clear_cache()

Output

mlx-vlm version: 0.1.7
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/nanoLLaVA-1.5-4bit 
Fetching 11 files: 100%|██████████| 11/11 [00:50<00:00,  4.59s/it]
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 219701.64it/s]
==========
Image: ['http://images.cocodataset.org/val2017/000000039769.jpg'] 

Prompt: <|im_start|>system
Answer the questions.<|im_end|><|im_start|>user
<image>
Describe this image.<|im_end|><|im_start|>assistant

The image shows a close-up view of two cats lying down on a pink fabric surface. Both cats have a striped pattern on their fur, with the one on the left having a darker shade of brown, and the one on the right having a lighter shade of brown. They are positioned in such a way that the left cat is facing the camera, while the right cat is looking away. The cats are lying on their stomachs, and the fabric surface is slightly wrinkled. The image has a sepia tone, which gives it a vintage or antique look. There are no texts or other objects in the image. The style of the image is a straightforward, candid photograph, capturing a moment of relaxation for the cats.
==========
Prompt: 21 tokens, 75.857 tokens-per-sec
Generation: 146 tokens, 143.235 tokens-per-sec
Peak memory: 1.408 GB
The image shows a close-up view of two cats lying down on a pink fabric surface. Both cats have a striped pattern on their fur, with the one on the left having a darker shade of brown, and the one on the right having a lighter shade of brown. They are positioned in such a way that the left cat is facing the camera, while the right cat is looking away. The cats are lying on their stomachs, and the fabric surface is slightly wrinkled. The image has a sepia tone, which gives it a vintage or antique look. There are no texts or other objects in the image. The style of the image is a straightforward, candid photograph, capturing a moment of relaxation for the cats.
Output generated in 2.06s
Memory used: 0.78 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Phi-3.5-vision-instruct-4bit 
Fetching 12 files: 100%|██████████| 12/12 [00:00<00:00, 80530.64it/s]
[/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.11/site-packages/transformers/models/auto/image_processing_auto.py:524](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.11/site-packages/transformers/models/auto/image_processing_auto.py:524): FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
  warnings.warn(
Fetching 12 files: 100%|██████████| 12/12 [00:00<00:00, 235194.62it/s]
==========
Image: ['http://images.cocodataset.org/val2017/000000039769.jpg'] 

Prompt: <|user|>
<|image_1|>Describe this image.<|end|>
<|assistant|>

The image shows two cats lying on a pink couch. The cat on the left is a tabby with a mix of dark and light stripes, while the cat on the right is a solid black cat. Both cats have their eyes closed, suggesting they are asleep. The couch has a pink cushion, and there are two remote controls on the couch.<|end|>
==========
Prompt: 771 tokens, 614.139 tokens-per-sec
Generation: 83 tokens, 33.288 tokens-per-sec
Peak memory: 3.704 GB
The image shows two cats lying on a pink couch. The cat on the left is a tabby with a mix of dark and light stripes, while the cat on the right is a solid black cat. Both cats have their eyes closed, suggesting they are asleep. The couch has a pink cushion, and there are two remote controls on the couch.<|end|>
Output generated in 4.51s
Memory used: 2.17 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Qwen2-VL-2B-Instruct-4bit 
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 186037.68it/s]
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 254902.45it/s]
==========
Image: ['http://images.cocodataset.org/val2017/000000039769.jpg'] 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Describe this image.<|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>assistant

The image shows two cats lying on a pink blanket. The cat on the left is striped with a mix of black, brown, and white, and it is lying on its side with its head resting on the blanket. The cat on the right is also striped, with a mix of black, brown, and white, and it is lying on its back with its head resting on the blanket as well. Both cats appear to be resting or sleeping, and there are two remote controls placed on the blanket next to them.
==========
Prompt: 416 tokens, 732.923 tokens-per-sec
Generation: 105 tokens, 169.277 tokens-per-sec
Peak memory: 3.704 GB
The image shows two cats lying on a pink blanket. The cat on the left is striped with a mix of black, brown, and white, and it is lying on its side with its head resting on the blanket. The cat on the right is also striped, with a mix of black, brown, and white, and it is lying on its back with its head resting on the blanket as well. Both cats appear to be resting or sleeping, and there are two remote controls placed on the blanket next to them.
Output generated in 1.94s
Memory used: 0.58 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running HuggingFaceTB/SmolVLM-Instruct 
Fetching 12 files: 100%|██████████| 12/12 [00:00<00:00, 151601.35it/s]
Some kwargs in processor config are unused and will not have any effect: image_seq_len. 
Fetching 12 files: 100%|██████████| 12/12 [00:00<00:00, 181049.09it/s]
==========
Image: ['http://images.cocodataset.org/val2017/000000039769.jpg'] 

Prompt: <|im_start|>User:<image>Describe this image.<end_of_utterance>
Assistant:
 Two cats are sleeping on a pink blanket.
==========
Prompt: 1195 tokens, 702.406 tokens-per-sec
Generation: 10 tokens, 78.259 tokens-per-sec
Peak memory: 6.007 GB
 Two cats are sleeping on a pink blanket.
Output generated in 2.68s
Memory used: 4.22 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Llama-3.2-11B-Vision-Instruct-4bit 
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 145187.45it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 235929.60it/s]
==========
Image: ['http://images.cocodataset.org/val2017/000000039769.jpg'] 

Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Describe this image.<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>


The image shows two cats lying on a pink blanket, with two remote controls placed nearby. The cats are positioned in a way that suggests they are watching something on a television, and the remote controls are likely used to control the TV.

* Two cats:
	+ One cat is smaller and has a fluffy tail
	+ The other cat is larger and has a more mottled coat
	+ Both cats are lying on their sides, with their heads turned towards the TV
* Two remote controls:
	+ One remote control is placed near the smaller cat
	+ The other remote control is placed near the larger cat
	+ Both remote controls have a similar design and are likely used to control the TV
* A pink blanket:
	+ The blanket is a bright pink color
	+ It appears to be made of a soft, plush material
	+ The blanket is spread out on a surface, possibly a couch or a bed

Overall, the image suggests that the cats are enjoying a relaxing afternoon, watching something on TV and using the remote controls to control the program.
==========
Prompt: 15 tokens, 2.941 tokens-per-sec
Generation: 221 tokens, 6.223 tokens-per-sec
Peak memory: 16.252 GB
The image shows two cats lying on a pink blanket, with two remote controls placed nearby. The cats are positioned in a way that suggests they are watching something on a television, and the remote controls are likely used to control the TV.

* Two cats:
	+ One cat is smaller and has a fluffy tail
	+ The other cat is larger and has a more mottled coat
	+ Both cats are lying on their sides, with their heads turned towards the TV
* Two remote controls:
	+ One remote control is placed near the smaller cat
	+ The other remote control is placed near the larger cat
	+ Both remote controls have a similar design and are likely used to control the TV
* A pink blanket:
	+ The blanket is a bright pink color
	+ It appears to be made of a soft, plush material
	+ The blanket is spread out on a surface, possibly a couch or a bed

Overall, the image suggests that the cats are enjoying a relaxing afternoon, watching something on TV and using the remote controls to control the program.
Output generated in 41.39s
Memory used: 3.97 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/idefics2-8b-4bit 
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 229539.02it/s]
Fetching 11 files: 100%|██████████| 11/11 [00:00<00:00, 169622.59it/s]
==========
Image: ['http://images.cocodataset.org/val2017/000000039769.jpg'] 

Prompt: User: Describe this image.<image><end_of_utterance>
Assistant:
Two house cats are laying on a bed with a pink comforter. They are using remote controllers as toys.<end_of_utterance>
==========
Prompt: 79 tokens, 140.361 tokens-per-sec
Generation: 26 tokens, 44.843 tokens-per-sec
Peak memory: 16.252 GB
Two house cats are laying on a bed with a pink comforter. They are using remote controllers as toys.<end_of_utterance>
Output generated in 1.91s
Memory used: 0.75 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Note⚠️:

I did found a tiny bug with SmolVLM-instruct that has been fixed here #164 and will be availble in the next release today, after the tests clear.

Blaizzy · 2024-12-30T12:42:36Z

@jrp2014 and @sachinraja13 v0.1.8 is out with the fix for SmolVLM-Instruct 🚀

sachinraja13 · 2024-12-30T14:18:57Z

This is great, thank you so much @Blaizzy !

Blaizzy · 2024-12-30T20:21:43Z

My pleasure!

Happy new year in advance 🚀

Blaizzy added 19 commits December 26, 2024 14:58

remove unused

85214d8

add default layer_norm

636cbc9

remove unused

24859d5

remove llava_bunny and idefics2 custom configs

e1978f2

refactor molmo and qwen2 config

a2a262b

add deprecation warning

e8a01c1

refactor update model configs

c5fb40c

refactor sanitize weights

09d2947

refactor class_predicate

d8db8d1

move custom config logic to from_dict

cb0ddc6

uncomment

22b1fdf

fix config name

a33ee68

rename aligner to projector

ab00e2b

fix tests

230750a

remove module from update list

7c6fe54

add trusted remote as kwargs

923a633

update baseImageProcessor

7a78639

refactor image processor

1cb7b53

pin latest transformers

f373f37

Blaizzy mentioned this pull request Dec 27, 2024

Molmo-7B-D 4bit suddenly stopped working #160

Closed

Blaizzy added 8 commits December 27, 2024 18:47

bump version

9357f90

refactor prepare inputs

941667c

simplifiy image loading

c7bbf2c

fix load_image and refactor load_config

dd2962a

make skip_non_divisible a default

11b9422

skip non divisible default and rename model inputs

ef7d20b

refactor condition

16c579f

fix language input only

3d65478

Blaizzy mentioned this pull request Dec 29, 2024

Support for Null Image input #94

Closed

add fetch KV

d85d22d

Blaizzy added 3 commits December 30, 2024 01:37

Increase default max tokens to 256

0c5db83

refactor generate, generate step and stream

3910796

fix high usage and add language only support (#163)

8740c0a

Blaizzy merged commit 78920b0 into main Dec 30, 2024
1 check passed

Blaizzy mentioned this pull request Dec 30, 2024

fix idefics3 vision model type checking #164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor utils #1 #161

Refactor utils #1 #161

Blaizzy commented Dec 26, 2024 •

edited

Loading

jrp2014 commented Dec 30, 2024

sachinraja13 commented Dec 30, 2024

Blaizzy commented Dec 30, 2024 •

edited

Loading

Blaizzy commented Dec 30, 2024

sachinraja13 commented Dec 30, 2024

Blaizzy commented Dec 30, 2024

Refactor utils #1 #161

Refactor utils #1 #161

Conversation

Blaizzy commented Dec 26, 2024 • edited Loading

jrp2014 commented Dec 30, 2024

sachinraja13 commented Dec 30, 2024

Blaizzy commented Dec 30, 2024 • edited Loading

Script ✅

Output

Note⚠️:

Blaizzy commented Dec 30, 2024

sachinraja13 commented Dec 30, 2024

Blaizzy commented Dec 30, 2024

Blaizzy commented Dec 26, 2024 •

edited

Loading

Blaizzy commented Dec 30, 2024 •

edited

Loading