MV-Adapter: Multi-view Consistent Image Generation Made Easy🚀

🏠 Project Page | Paper | Demo

MV-Adapter is a versatile plug-and-play adapter that adapt T2I models and their derivatives to multi-view generators.

Highlight Features: Generate multi-view images

with 768 Resolution using SDXL
using personalized models (e.g. DreamShaper), distilled models (e.g. LCM), or extensions (e.g. ControlNet)
from text or image condition
can be guided by geometry for texture generation

Updates

[2024-12] Release model weights, gradio demo, inference scripts and comfyui of text-/image- to multi-view generation models.

Model Zoo & Demos

No need to download manually. Running the scripts will download model weights automatically.

Model	Base Model	HF Weights	Demo Link
Text-to-Multiview	SDXL	mvadapter_t2mv_sdxl.safetensors	General / Anime
Image-to-Multiview	SDXL	mvadapter_i2mv_sdxl.safetensors	Demo
Text-Geometry-to-Multiview	SDXL
Image-Geometry-to-Multiview	SDXL
Image-to-Arbitrary-Views	SDXL

Installation

Clone the repo first:

git clone https://github.com/huanngzh/MV-Adapter.git
cd MV-Adapter

(Optional) Create a fresh conda env:

conda create -n mvadapter python=3.10
conda activate mvadapter

Install necessary packages (torch > 2):

# pytorch (select correct CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# other dependencies
pip install -r requirements.txt

Launch Demo

Text to Multiview Generation

With SDXL:

python -m scripts.gradio_demo_t2mv --base_model "stabilityai/stable-diffusion-xl-base-1.0"

Reminder: When switching the demo to another base model, delete the gradio_cached_examples directory, otherwise it will affect the examples results of the next demo.

With anime-themed Animagine XL 3.1:

python -m scripts.gradio_demo_t2mv --base_model "cagliostrolab/animagine-xl-3.1"

With general Dreamshaper:

python -m scripts.gradio_demo_t2mv --base_model "Lykon/dreamshaper-xl-1-0" --scheduler ddpm

You can also specify a new diffusers-format text-to-image diffusion model using --base_model. Note that it should be the model name in huggingface, such as stabilityai/stable-diffusion-xl-base-1.0, or a local path refer to a text-to-image pipeline directory. Note that if you specify latent-consistency/lcm-sdxl to use latent consistency models, please add --scheduler lcm to the command.

Image to Multiview Generation

With SDXL:

python -m scripts.gradio_demo_i2mv

Inference Scripts

We recommend that experienced users check the files in the scripts directory to adjust the parameters appropriately to try the best "card drawing" results.

Text to Multiview Generation

Note that you can specify a diffusers-format text-to-image diffusion model as the base model using --base_model xxx. It should be the model name in huggingface, such as stabilityai/stable-diffusion-xl-base-1.0, or a local path refer to a text-to-image pipeline directory.

With SDXL:

python -m scripts.inference_t2mv_sdxl --text "an astronaut riding a horse" \
--seed 42 \
--output output.png

With personalized models:

anime-themed Animagine XL 3.1

python -m scripts.inference_t2mv_sdxl --base_model "cagliostrolab/animagine-xl-3.1" \
--text "1girl, izayoi sakuya, touhou, solo, maid headdress, maid, apron, short sleeves, dress, closed mouth, white apron, serious face, upper body, masterpiece, best quality, very aesthetic, absurdres" \
--seed 0 \
--output output.png

general Dreamshaper

python -m scripts.inference_t2mv_sdxl --base_model "Lykon/dreamshaper-xl-1-0" \
--scheduler ddpm \
--text "the warrior Aragorn from Lord of the Rings, film grain, 8k hd" \
--seed 0 \
--output output.png

realistic real-dream-sdxl

python -m scripts.inference_t2mv_sdxl --base_model "stablediffusionapi/real-dream-sdxl" \
--scheduler ddpm \
--text "macro shot, parrot, colorful, dark shot, film grain, extremely detailed" \
--seed 42 \
--output output.png

With LCM:

python -m scripts.inference_t2mv_sdxl --unet_model "latent-consistency/lcm-sdxl" \
--scheduler lcm \
--text "Samurai koala bear" \
--num_inference_steps 8 \
--seed 42 \
--output output.png

With LoRA:

stylized lora 3d_render_style_xl

python -m scripts.inference_t2mv_sdxl --lora_model "goofyai/3d_render_style_xl/3d_render_style_xl.safetensors" \
--text "3d style, a fox with flowers around it" \
--seed 20 \
--lora_scale 1.0 \
--output output.png

With ControlNet:

Scribble to Multiview with controlnet-scribble-sdxl-1.0

python -m scripts.inference_scribble2mv_sdxl --text "A 3D model of Finn the Human from the animated television series Adventure Time. He is wearing his iconic blue shirt and green backpack and has a neutral expression on his face. He is standing in a relaxed pose with his left foot slightly forward and his right foot back. His arms are at his sides and his head is turned slightly to the right. The model is made up of simple shapes and has a stylized, cartoon-like appearance. It is textured to resemble the character's appearance in the show." \
--seed 0 \
--output output.png \
--guidance_scale 5.0 \
--controlnet_images "assets/demo/scribble2mv/color_0000.webp" "assets/demo/scribble2mv/color_0001.webp" "assets/demo/scribble2mv/color_0002.webp" "assets/demo/scribble2mv/color_0003.webp" "assets/demo/scribble2mv/color_0004.webp" "assets/demo/scribble2mv/color_0005.webp" \
--controlnet_conditioning_scale 0.7

Image to Multiview Generation

With SDXL:

python -m scripts.inference_i2mv_sdxl \
--image assets/demo/i2mv/A_decorative_figurine_of_a_young_anime-style_girl.png \
--text "A decorative figurine of a young anime-style girl" \
--seed 21 --output output.png --remove_bg

With LCM:

python -m scripts.inference_i2mv_sdxl \
--unet_model "latent-consistency/lcm-sdxl" \
--scheduler lcm \
--image assets/demo/i2mv/A_juvenile_emperor_penguin_chick.png \
--text "A juvenile emperor penguin chick" \
--num_inference_steps 8 \
--seed 0 --output output.png --remove_bg

ComfyUI

Please check ComfyUI-MVAdapter Repo for details.

Text to Multiview Generation

Image to Multiview Generation

Citation

@article{huang2024mvadapter,
  title={MV-Adapter: Multi-view Consistent Image Generation Made Easy},
  author={Huang, Zehuan and Guo, Yuanchen and Wang, Haoran and Yi, Ran and Ma, Lizhuang and Cao, Yan-Pei and Sheng, Lu},
  journal={arXiv preprint arXiv:2412.03632},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
mvadapter		mvadapter
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MV-Adapter: Multi-view Consistent Image Generation Made Easy🚀

🏠 Project Page | Paper | Demo

Updates

Model Zoo & Demos

Installation

Launch Demo

Text to Multiview Generation

Image to Multiview Generation

Inference Scripts

Text to Multiview Generation

Image to Multiview Generation

ComfyUI

Citation

About

Releases

Packages

Languages

License

Sirius2050/MV-Adapter

Folders and files

Latest commit

History

Repository files navigation

MV-Adapter: Multi-view Consistent Image Generation Made Easy🚀

🏠 Project Page | Paper | Demo

Updates

Model Zoo & Demos

Installation

Launch Demo

Text to Multiview Generation

Image to Multiview Generation

Inference Scripts

Text to Multiview Generation

Image to Multiview Generation

ComfyUI

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages