We propose Dream-in-4D, which features a novel two-stage approach for text-to-4D synthesis, leveraging (1) 3D and 2D diffusion guidance to effectively learn a high-quality static 3D asset in the first stage; (2) a deformable neural radiance field that explicitly disentangles the learned static asset from its deformation, preserving quality during motion learning; and (3) a multi-resolution feature grid for the deformation field with a displacement total variation loss to effectively learn motion with video diffusion guidance in the second stage. Thanks to its motion-disentangled representation, Dream-in-4D can also be easily adapted for controllable generation where appearance is defined by one or multiple images, without the need to modify the motion learning stage. Thus, our method offers a unified approach for text-to-4D, image-to-4D and personalized 4D generation tasks.
This repository is the official PyTorch implementation of Dream-in-4D introduced in the paper:
A Unified Approach for Text- and Image-guided 4D Scene Generation, Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Otmar Hilliges, Shalini De Mello, CVPR 2024.
For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.
- This code is forked from threestudio, commit 2c20227
This part is the same as original threestudio. Please check the original repo for detailed installation instructions.
python3 -m virtualenv venv
. venv/bin/activate
python3 -m pip install --upgrade pip
# Install pytorch
pip install torch torchvision
# (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions
pip install ninja
# Install dependencies
pip install -r requirements.txt
# Login to huggingface to use deepfloyd (for the image-to-4D task)
huggingface-cli login
MVDream multi-view diffusion model is provided in a different codebase. Install it by:
git clone https://github.com/bytedance/MVDream extern/MVDream
pip install -e extern/MVDream
# Only run this if you already have a threestudio environment and didn't re-install requirements.txt from our repo
pip install av
We modified the script to take multiple config files. The first config file is the shared training configurations, the second (and third) one(s) are the prompts and the subjects. Usually, you only need to modified the second (and third) prompts.
# Text-to-3D
# (Optional) system.SD_view can be used to disable SD on the back views.
# SD_view=180 --> all views ([-180, 180]) are used.
# SD_view=145 --> frontal and side views ([-145, 145]) are used.
python launch.py --config configs/stage_1/mvdream-sd21-sd.yaml configs/stage_1/text-to-3D/dog_superhero.yaml --train --gpu 0 system.SD_view=180
# Image-to-3D
# This is our implemented version using zero123 and deep-floyd-if guidance, which converges faster than threestudio's implementation.
python launch.py --config configs/stage_1/magic123-coarse-if-new.yaml configs/stage_1/image-to-3D/corgi.yaml --train --gpu 0
# Personalized-3D
# In configs/stage_1/personalized_3D/subjects/dog[#].yaml, we provide the lora attention processor weights for the personalized StableDiffusion models trained with Dreambooth.
# The dreambooth loras are trained with: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py
# See instructions here: https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora
python launch.py --config configs/stage_1/mvdream-sd21-sd.yaml configs/stage_1/personalized_3D/prompts/superhero_sks_dog_wearing_red_cape_is_flying_through_the_sky.yaml configs/stage_1/personalized_3D/subjects/dog8.yaml --train --gpu 0 system.SD_view=180
# 3D-to-4D
# In configs/stage_2/prompts/[config_name].yaml, modify the text prompt and set system.geometry_convert_from to the ckpt from the static stage
# (Optional) setting system.guidance.num_hifa_steps=4 can leads to more stable motions, at the cost of training time. By default, system.guidance.num_hifa_steps=1.
python launch.py --config configs/stage_2/stage2_zeroscope_144x80.yaml configs/stage_2/prompts/fish.yaml --train --gpu 0
If you want to resume from a checkpoint, do:
# Resume training from the last checkpoint, you may replace last.ckpt with any other checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
Also check the threestudio repo for a complete guide on various features.
This code is built on the threestudio-project and the MVDream-project. Thanks to the maintainers for their contribution to the community!
If you find Dream-in-4D helpful, please consider citing:
@InProceedings{zheng2024unified,
title = {A Unified Approach for Text- and Image-guided 4D Scene Generation},
author = {Yufeng Zheng and Xueting Li and Koki Nagano and Sifei Liu and Otmar Hilliges and Shalini De Mello},
booktitle = {CVPR},
year = {2024}
}