A pipeline to generate long videos according to text prompt
Xinchen Zhang
Tsinghua University
A spectacular waterfall | A car driving down the road. |
Astronauts traveling in space | A cat looking out the window |
Before inference, you need to use LLMs to obtain segmented fragments based on the prompt, along with complex descriptions of each fragment.
We provide a template in template.txt
. Then copy and paste the template to ChatGPT, you can get the generated prompts.
We offer two ways to generate a long video. If you choose I2VGen-XL as the backbone, run:
python pipeline_i2vgenxl.py --seed 1234 --fps 16
If you choose SVD as the backbone, run:
python pipeline_svd.py --seed 1234 --fps 16
After that, we use EMA-VFI to interpolate the video.