Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models to port to MLX-VLM #39

Open
10 of 26 tasks
Blaizzy opened this issue Jun 11, 2024 · 20 comments
Open
10 of 26 tasks

Models to port to MLX-VLM #39

Blaizzy opened this issue Jun 11, 2024 · 20 comments
Labels
good first issue Good for newcomers

Comments

@Blaizzy
Copy link
Owner

Blaizzy commented Jun 11, 2024

  • MiniCPM-Llama3-V-2_5
  • Florence 2
  • Phi-3-vision
  • Bunny
  • Dolphi-vision-72b
  • Llava Next
  • Qwen2-VL
  • Pixtral
  • Llama-3.2
  • Llava Interleave
  • Idefics 3
  • OmniParser
  • Llava onevision
  • internlm-xcomposer2d5-7b
  • InternVL
  • CogVLM2
  • Copali
  • MoonDream2
  • Yi-VL
  • CuMo
  • Kosmos-2.5
  • Molmo
  • Ovis Gemma
  • Aria
  • NVIDIA NVLM
  • GOT

Instructions:

  1. Select the model and comment below with your selection
  2. Create a Draft PR titled: "Add support for X"
  3. Read Contribution guide
  4. Check existing models
  5. Tag @Blaizzy for code reviews and questions.

If the model you want is not listed, please suggest it and I will add it.

@Blaizzy
Copy link
Owner Author

Blaizzy commented Jun 22, 2024

Next release of Llava-Next

TODO:
update text config defaults to avoid errors with Llava-v1.6-vicuna:

class TextConfig:
    model_type: str
    hidden_size: int = 4096
    num_hidden_layers: int = 32
    intermediate_size: int = 11008
    num_attention_heads: int = 32
    rms_norm_eps: float = 1e-05
    vocab_size: int = 32064
    num_key_value_heads: int = 32
    rope_theta: float = 1000000
    rope_traditional: bool = False
    rope_scaling: Optional[Dict[str, Union[float, str]]] = None

@BoltzmannEntropy
Copy link

Thanks for the great repo. This should also be on the list: https://github.com/THUDM/CogVLM2
I am now just reading the code, and trying to free some time for the conversion routine.

@jrp2014
Copy link

jrp2014 commented Aug 8, 2024

@Blaizzy
Copy link
Owner Author

Blaizzy commented Aug 8, 2024

Hey @BoltzmannEntropy and @jrp2014,

Thanks for the suggestions!

I have added them to the backlog

@jrp2014
Copy link

jrp2014 commented Aug 27, 2024

MiniCPM-V v2.6

1 similar comment
@jrp2014
Copy link

jrp2014 commented Aug 27, 2024

MiniCPM-V v2.6

@s-smits
Copy link

s-smits commented Sep 7, 2024

Do you have a link to Florence-2?

@ChristianWeyer
Copy link

Is the above list the ultimate and up-to-date list of supported models @Blaizzy? Thanks for your hard work!

@Blaizzy
Copy link
Owner Author

Blaizzy commented Sep 10, 2024

Hey @ChristianWeyer
Its mostly up-to-date, just missing qwen2-vl

@Blaizzy
Copy link
Owner Author

Blaizzy commented Sep 10, 2024

@s-smits here you go:

https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py

@ChristianWeyer
Copy link

[x] Phi-3-vision

Thanks!
I guess Phi-3-vision includes 3.5?

@Blaizzy
Copy link
Owner Author

Blaizzy commented Sep 10, 2024

Yes, they have the same arch so there are no changes needed :)

@pulkitjindal88
Copy link

Hey @Blaizzy, thanks for this great framework. Is there any priority for InternVL? I can see it is present in your list. Just wanted to know if it planned in your near term. Want to make the model run on my macbook and mlx-vlm looks to be the best way for that.

@chigkim
Copy link

chigkim commented Sep 21, 2024

Qwen2-VL-72B would be amazing!

@simonw
Copy link

simonw commented Sep 29, 2024

This recipe seems to work for Qwen2-VL-2B-Instruct:

python -m mlx_vlm.generate \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --max-tokens 100 \
  --temp 0.0 \
  --image django-roadmap.png \
  --prompt "Describe image in detail, include all text"

My results here: https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17

@chigkim
Copy link

chigkim commented Sep 30, 2024

Yep they just merged Qwen2-vl support this weekend.

@xSNYPSx
Copy link

xSNYPSx commented Oct 2, 2024

Molmo please

@chigkim
Copy link

chigkim commented Oct 2, 2024

Nvidia just dropped multimodal NVLM-D-72B. The benchmark looks pretty good.

https://huggingface.co/nvidia/NVLM-D-72B

@Blaizzy
Copy link
Owner Author

Blaizzy commented Oct 2, 2024

Yap, that's a pretty awesome model!
It's on my radar because we can run it in 4bit quant

@chigkim
Copy link

chigkim commented Oct 25, 2024

Pixtral-12B now has Base model.
https://huggingface.co/mistralai/Pixtral-12B-Base-2409

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

9 participants