Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Opensora-PKU] v1.2.0 training and inference #647

Open
wants to merge 165 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
165 commits
Select commit Hold shift + click to select a range
77fc28b
init
wtomin Jul 29, 2024
b64dc96
fix sample_t2v
wtomin Jul 29, 2024
8401b59
revise causalvae
wtomin Jul 30, 2024
3d3c3be
vae reconstruction revision
wtomin Jul 30, 2024
c3d3b70
vae reconstruction updates
wtomin Jul 30, 2024
ad31b63
vae two more modules
wtomin Jul 30, 2024
9ccfe5c
vae inference
wtomin Jul 31, 2024
296d6e5
fix vae inference error
wtomin Jul 31, 2024
b51c69a
fix causal conv1d padding
wtomin Jul 31, 2024
1a1eba0
rope and pos_embed
wtomin Aug 5, 2024
c1b4deb
revise attention
wtomin Aug 5, 2024
7f7af40
update opensora_t2v
wtomin Aug 5, 2024
499e8a7
update samplet2v
wtomin Aug 6, 2024
b4e0fa2
convert opensora t2v ckpt
wtomin Aug 6, 2024
1eb3da3
fix errors
wtomin Aug 7, 2024
6b19bf6
load vae differently
wtomin Aug 7, 2024
dde36ac
load vae differently
wtomin Aug 7, 2024
401dd35
fix sample_t2v errors
wtomin Aug 7, 2024
93617d9
updates
wtomin Aug 7, 2024
69e23f9
test opensora
wtomin Aug 14, 2024
20b661e
remove redundant modules
wtomin Aug 15, 2024
882063b
put pixarttimesteps to fp32
wtomin Aug 16, 2024
30367a6
use diffusers attn_processor
wtomin Aug 16, 2024
9a1ec89
allow attention_mode
wtomin Aug 16, 2024
e229464
fix ms flash-attention
wtomin Aug 19, 2024
8bc9408
remove enable_flashattention
wtomin Aug 19, 2024
608a76a
test-rope
wtomin Aug 19, 2024
42f6385
fix rope position error
wtomin Aug 19, 2024
b8f1f15
fix enable_flash_attention
wtomin Aug 19, 2024
1525916
conver pytorch bin
wtomin Aug 19, 2024
c6334d7
fix graph error
wtomin Aug 19, 2024
f2a9f59
adapt to graph mode
wtomin Aug 20, 2024
562063f
remove text encoder and opensorat2v model conversion scripts
wtomin Aug 20, 2024
8f2c352
train init
wtomin Aug 20, 2024
47649fd
remove test scripts
wtomin Aug 20, 2024
5be23aa
correct inference error
wtomin Aug 20, 2024
e458559
support t1v and t2i sampling
wtomin Aug 20, 2024
44a50c7
update prompts
wtomin Aug 20, 2024
8e17248
compatible ckpt loading
wtomin Aug 20, 2024
8fc56b0
allow custom fp32 cells for text encoder
wtomin Aug 21, 2024
b4dec83
correct loading error and path error
wtomin Aug 21, 2024
ef7c660
update train script
wtomin Aug 21, 2024
8417563
update sample_text_embed
wtomin Aug 21, 2024
c150652
initial training updates
wtomin Aug 21, 2024
77a542e
alter filter_prefix name
wtomin Aug 21, 2024
f6c4fc7
updates
wtomin Aug 21, 2024
18b4687
handle bf16 torch weight
wtomin Aug 21, 2024
93120b9
adapt to graph mode
wtomin Aug 21, 2024
1fb1f2f
a torch-like dataloader
wtomin Aug 21, 2024
71e106b
collate_fn
wtomin Aug 21, 2024
0e4f148
update datasets
wtomin Aug 22, 2024
7a0d075
udpate train dataset in train.py
wtomin Aug 22, 2024
a10ccf4
test dataset
wtomin Aug 22, 2024
9a2fdf9
update test
wtomin Aug 22, 2024
c6ad29a
updates
wtomin Aug 22, 2024
ece842b
use batch sampler
wtomin Aug 22, 2024
fb40716
change transform func
wtomin Aug 22, 2024
29bc734
update text utils
wtomin Aug 23, 2024
2eda37a
update Collate fn
wtomin Aug 23, 2024
5cafed1
adapt to return text_embed
wtomin Aug 23, 2024
b51bec1
set dataloder len
wtomin Aug 23, 2024
053cb69
adapt text dataset
wtomin Aug 23, 2024
6c0c0ce
merge data
wtomin Aug 23, 2024
cfce529
remove redundant
wtomin Aug 23, 2024
9b71a43
sequence parallel sampling
wtomin Aug 23, 2024
e9b2f88
text encoder dtype to fp16
wtomin Aug 26, 2024
92f6bab
correct text dataset error
wtomin Aug 26, 2024
d6561ac
turn pos_embed as tensor not parameter
wtomin Aug 26, 2024
f3d3340
text encoder mindspore_dtype to fp16
wtomin Aug 26, 2024
404ef81
change t2i cfg to 4.5
wtomin Aug 26, 2024
c49b036
update training data
wtomin Aug 27, 2024
4b17530
update netwithloss args order
wtomin Aug 27, 2024
00ef555
allow text embed
wtomin Aug 27, 2024
4653557
jit_syntax_level
wtomin Aug 27, 2024
266bbda
test transform_al
wtomin Aug 27, 2024
865ca78
replace transform by albumentations
wtomin Aug 27, 2024
723e0e2
edit dtype and replace causalconv3d cat ops
wtomin Aug 28, 2024
ee3c3fc
refine printing
wtomin Aug 28, 2024
87a36c4
allow rope2d to compute inv_freq ahead
wtomin Aug 28, 2024
c0c38fb
replace mint.repeat
wtomin Aug 28, 2024
15a8db0
correct rope error
wtomin Aug 28, 2024
c06e9d2
update training script
wtomin Aug 28, 2024
c53924c
edit vae fp32 cell list
wtomin Aug 28, 2024
f3dc7a5
vae inference: remove gn from fp32 cells list by default
wtomin Aug 29, 2024
9808abf
use kbk for training
wtomin Aug 29, 2024
7cf00d5
compute inv_freq in init function using np.float64
wtomin Aug 29, 2024
b9040f5
amp: replace PixArtAlphaCombinedTimestepSizeEmbeddings by diffusers silu
wtomin Aug 29, 2024
c7bb3a6
update readme
wtomin Aug 29, 2024
6271ef3
update demo
wtomin Aug 29, 2024
83de3e1
update ddp scripts
wtomin Aug 30, 2024
1c4dff8
update args
wtomin Aug 30, 2024
283947a
revise model ckpt conversion logic
wtomin Aug 30, 2024
18b4ea0
update sampling
wtomin Aug 30, 2024
168ef12
support ema offloading
wtomin Aug 30, 2024
d443025
support ema offload
wtomin Aug 30, 2024
ba5befe
load from pretrained
wtomin Aug 30, 2024
aac8882
update training script
wtomin Aug 30, 2024
fd4dfac
updates
wtomin Aug 30, 2024
16e71c2
huiyao's edits on seq parallel
wtomin Aug 30, 2024
6430dd8
sp inference pipeline
wtomin Aug 30, 2024
ceeaaa6
sp inference script
wtomin Sep 2, 2024
5c2e519
clip_grad True by default
wtomin Sep 2, 2024
719a807
refine readme
wtomin Sep 2, 2024
385c8c7
update readm
wtomin Sep 2, 2024
260e7da
zero optimizer from zhaoting
wtomin Sep 2, 2024
7171810
adapt to zero1
wtomin Sep 2, 2024
75aecad
training script zero2
wtomin Sep 2, 2024
f4d3cb5
revise zero2 script
wtomin Sep 2, 2024
83136de
revert opensora pipeline changes
wtomin Sep 2, 2024
c9e5e39
allow mindcv optimizer as an option
wtomin Sep 2, 2024
d0b923d
update sampling scripts names
wtomin Sep 2, 2024
f82e8ac
update sample t2v with ms_checkpoint
wtomin Sep 3, 2024
32f0a53
edits from huiyao
wtomin Sep 3, 2024
04e0221
change dataset to mixkit
wtomin Sep 3, 2024
8d364c6
use lr=1e-5
wtomin Sep 3, 2024
1ec3959
fix name error of text embed path
wtomin Sep 3, 2024
3df7549
fix lr print
wtomin Sep 3, 2024
8b4e57a
remove video ddp and use zero2
wtomin Sep 3, 2024
0e37439
use StopAtStepCallback
wtomin Sep 3, 2024
59526d3
test dataset
wtomin Sep 3, 2024
db6c16a
remove infinite dataloader
wtomin Sep 3, 2024
ee7dba7
do not support drop_last
wtomin Sep 3, 2024
8269133
sample t2v default save as mp4 file
wtomin Sep 4, 2024
13c57ad
make ops.equal input the same size
wtomin Sep 4, 2024
9ad5f89
test ms text encoder
wtomin Sep 4, 2024
72bef80
remove torch dependency
wtomin Sep 5, 2024
42dd49e
sp_sampling
wtomin Sep 5, 2024
b865d7e
sp support
wtomin Sep 5, 2024
a99af90
update zero helper check
wtomin Sep 6, 2024
decaa17
support sp training
wtomin Sep 6, 2024
6eae031
revert rebase changes to mindone
wtomin Sep 11, 2024
379d4ec
update scripts
wtomin Sep 12, 2024
f90ae65
update performance table
wtomin Sep 12, 2024
aabc3ea
updates
wtomin Sep 12, 2024
42cf8b8
set syntax level to lax
wtomin Sep 12, 2024
79668cd
updates
wtomin Sep 12, 2024
960acf8
solve loss not printing
wtomin Sep 12, 2024
d879ee1
delete bak code
wtomin Sep 13, 2024
770a537
delete copyright and revise max_row_size
wtomin Sep 13, 2024
8930b34
no precision_mode by default
wtomin Sep 13, 2024
ea8ec20
change mindcv.adamw to adamw_re
wtomin Sep 13, 2024
2cc4571
remove incorrect info in readme
wtomin Sep 13, 2024
4d13c55
correct typo
wtomin Sep 16, 2024
34174d4
update 29x480p speed
wtomin Sep 16, 2024
d0a27cf
ema adaptation
wtomin Sep 16, 2024
8bde5f0
Revert "ema adaptation"
wtomin Sep 16, 2024
2c9129b
update 29x720p training script and speed
wtomin Sep 16, 2024
e72f7dc
update 93x720p
wtomin Sep 16, 2024
35110b5
revise learning rate to 5e-5
wtomin Sep 16, 2024
1bdb8f8
speed update
wtomin Sep 17, 2024
d31bcf5
use adamw_re from mindcv as the default optimizer
wtomin Sep 17, 2024
e00d3c8
use variant ema_decay
wtomin Sep 17, 2024
f8a7a1e
update printing message
wtomin Sep 17, 2024
c62bf02
update cann hyperlink
wtomin Sep 24, 2024
74ee0ff
boolean on multiple elements is ambiguous
wtomin Sep 25, 2024
65022a7
fix bug to use grad_accumulation with zero2
wtomin Oct 4, 2024
8e85bc5
correct table header typo
wtomin Oct 10, 2024
90ef9b9
fix typo
wtomin Oct 16, 2024
e91267b
update causalvae training
wtomin Oct 16, 2024
6946e70
val during training
wtomin Oct 9, 2024
ace2aa6
update metric fn
wtomin Oct 10, 2024
2813f53
val during training
wtomin Oct 14, 2024
8bd3e53
align LR to 1e-4
wtomin Oct 16, 2024
7b46343
support zero3 and graph mode
wtomin Oct 16, 2024
a61718b
correct typo
wtomin Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

420 changes: 190 additions & 230 deletions examples/opensora_pku/README.md

Large diffs are not rendered by default.

77 changes: 0 additions & 77 deletions examples/opensora_pku/docs/structure.md

This file was deleted.

1 change: 0 additions & 1 deletion examples/opensora_pku/docs/training_args.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ This document includes the training arguments of [`opensora/train/train_t2v.py`]
- `multi_scale` (type: bool, default: False): whether to support multi-scale training. Multi-scale training is not supported now. Working in progress.

## Model Acceleration
- `enable_flash_attention` (type: bool, default: False): whether to apply Flash-Attention in LatteT2V model. If True, it will save memory.
- `enable_tiling` (type: bool, default: False): whether to use vae tiling to save memory. If True, it will run vae inference with less memory in a slower speed.
- `tile_overlap_factor` (type: float, default: 0.25, range: (0, 1)): the overlap factor of vae tiling.
- `use_recompute` (type: bool, default: False): whether to use recompute (gradient checkpointing) to save memory. If True, will run lattet2v training with less memory in a slower speed.
Expand Down
41 changes: 18 additions & 23 deletions examples/opensora_pku/examples/prompt_list_0.txt
Original file line number Diff line number Diff line change
@@ -1,25 +1,20 @@
Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.
A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.
Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.
A time-lapse of a storm forming over the ocean, dark clouds gathering and lightning flashing. The storm's energy creates spirals of light that dance across the sky.
A majestic eagle perches on a high cliff, its keen eyes scanning the valley below. With a powerful flap, it takes off, leaving a trail of sparkling feathers.
A single butterfly with wings that resemble stained glass flutters through a field of flowers. The shot captures the light as it passes through the delicate wings, creating a vibrant, colorful display. HD.
A solitary mermaid swims through an underwater cave filled with glowing crystals. The shot follows her graceful movements, capturing the play of light on her scales and the ethereal beauty of the cave.
Close-up of a dragon's eye as it slowly opens, revealing a fiery iris that reflects the burning landscape around it, while smoke wisps off its scaly eyelid.
A cat with the enigmatic smile of the Mona Lisa, lounging regally on a velvet cushion, her eyes following a fluttering butterfly that mirrors the mysterious allure of her expression. 4K.
A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.
This close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest. Its crest is made of delicate, lacy feathers, while its eye is a striking red color. The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic. The background is blurred, drawing attention to the bird’s striking appearance.
a cat wearing sunglasses and working as a lifeguard at pool.
A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt, he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.
A lone figure stands on the deck of a spaceship, looking out at a nebula filled with vibrant colors. The shot tracks their gaze, capturing the breathtaking beauty of the cosmic landscape and the sense of infinite possibility.
A large orange octopus is seen resting on the bottom of the ocean floor, blending in with the sandy and rocky terrain. lts tentacles are spread out around its body, and its eyes are closed. The octopus is unaware of a king crab that is crawling towards it from behind a rock,its claws raised and ready to attack. The crab is brown and spiny,with long legs and antennae. The scene is captured from a wide angle,showing the vastness and depth of the ocean. The wateris clear and blue, with rays of sunlight filtering through. The shot is sharp and crisp, with a high dynamic range. The octopus and the crab are in focus, while the background is slightly blurred,creating a depth of field effect.
a dynamic interaction between the ocean and a large rock. The rock, with its rough texture and jagged edges, is partially submerged in the water, suggesting it is a natural feature of the coastline. The water around the rock is in motion, with white foam and waves crashing against the rock, indicating the force of the ocean's movement. The background is a vast expanse of the ocean, with small ripples and waves, suggesting a moderate sea state. The overall style of the scene is a realistic depiction of a natural landscape, with a focus on the interplay between the rock and the water.
A close-up of a woman’s face, illuminated by the soft light of dawn, her expression serene and content as she wakes up in a cozy bedroom.
An intense close-up of a detective’s face, lit by a single desk lamp, his eyes scanning a wall covered in photos and notes, deep in thought.
Audience members in a theater are captured in a series of medium shots, with a young man and woman in formal attire centrally positioned and illuminated by a spotlight effect.
A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea. Seabirds can be seen taking flight around the cliff's precipices. As the drone slowly moves from different angles, the changing sunlight casts shifting shadows that highlight the rugged textures of the cliff and the surrounding calm sea. The water gently laps at the rock base and the greenery that clings to the top of the cliff, and the scene gives a sense of peaceful isolation at the fringes of the ocean. The video captures the essence of pristine natural beauty untouched by human structures.
A vibrant scene of a snowy mountain landscape. The sky is filled with a multitude of colorful hot air balloons, each floating at different heights, creating a dynamic and lively atmosphere. The balloons are scattered across the sky, some closer to the viewer, others further away, adding depth to the scene. Below, the mountainous terrain is blanketed in a thick layer of snow, with a few patches of bare earth visible here and there. The snow-covered mountains provide a stark contrast to the colorful balloons, enhancing the visual appeal of the scene.
A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene.
A snowy forest landscape with a dirt road running through it. The road is flanked by trees covered in snow, and the ground is also covered in snow. The sun is shining, creating a bright and serene atmosphere. The road appears to be empty, and there are no people or animals visible in the video. The style of the video is a natural landscape shot, with a focus on the beauty of the snowy forest and the peacefulness of the road.
The dynamic movement of tall, wispy grasses swaying in the wind. The sky above is filled with clouds, creating a dramatic backdrop. The sunlight pierces through the clouds, casting a warm glow on the scene. The grasses are a mix of green and brown, indicating a change in seasons. The overall style of the video is naturalistic, capturing the beauty of the landscape in a realistic manner. The focus is on the grasses and their movement, with the sky serving as a secondary element. The video does not contain any human or animal elements.
A close-up of a magician’s crystal ball that reveals a futuristic cityscape within. Skyscrapers of light stretch towards the heavens, and flying cars zip through the air, casting neon reflections across the ball’s surface. 8K.
A majestic horse gallops across a bridge made of rainbows, each hoof striking sparks of color that cascade into the sky, the clouds parting to reveal a sunlit path to a distant, magical realm.
A close-up of a robot dog as it interacts with a group of real puppies in a park, its mechanical eyes blinking with curiosity and tail wagging energetically. High Resolution.
An elderly woman with white hair and a lined face is seated inside an older model car, looking out through the side window with a contemplative or mildly sad expression.
a realistic 3d rendering of a female character with curly blonde hair and blue eyes. she is wearing a black tank top and has a neutral expression while facing the camera directly. the background is a plain blue sky, and the scene is devoid of any other objects or text. the character is detailed, with realistic textures and lighting, suitable for a video game or high-quality animation. there is no movement or additional action in the video. the focus is entirely on the character's appearance and realistic rendering.
A panda strumming a guitar under a bamboo grove, its paws gently plucking the strings as a group of mesmerized rabbits watch, the music blending with the rustle of bamboo leaves. HD.
A close-up of a woman with a vintage hairstyle and bright red lipstick, gazing seductively into the camera, the background blurred to keep the focus solely on her.
In the jungle, a hidden temple stands guarded by statues of lions, their eyes glowing with emerald light, protecting secrets untold for millennia. 8K.
A close-up of an old man’s weathered face, with deep wrinkles and a thick white mustache, looking out to sea, the wind gently blowing through his hair.
An intense close-up of a soldier’s face, covered in dirt and sweat, his eyes filled with determination as he surveys the battlefield.
A river that flows uphill, defying gravity as it returns lost treasures from the sea to the mountain top, each item telling a story of a voyage gone by. HD.
A close-up of a man’s face, lit only by the glow of his computer screen, his eyes wide and unblinking as he discovers something shocking online.
On a deserted island, palm trees sway to summon a rainstorm, their leaves conducting the wind like maestros, orchestrating a symphony of thunder and lightning. High Resolution.
An extreme close-up of a middle-aged man’s face, with a five o’clock shadow, staring pensively into the distance as rain softly taps against the window beside him, his thoughts deep and contemplative.
A close-up of a man’s face, his expression one of deep concentration as he works on a complex task.
Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach.The crashing blue waters create white-tipped waves,while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green
shrubbery covers the cliffs edge. The steep drop from the road down to the beach is adramatic feat, with the cliff's edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.
a close-up shot of a woman standing in a dimly lit room. she is wearing a traditional chinese outfit, which includes a red and gold dress with intricate designs and a matching headpiece. the woman has her hair styled in an updo, adorned with a gold accessory. her makeup is done in a way that accentuates her features, with red lipstick and dark eyeshadow. she is looking directly at the camera with a neutral expression. the room has a rustic feel, with wooden beams and a stone wall visible in the background. the lighting in the room is soft and warm, creating a contrast with the woman's vibrant attire. there are no texts or other objects in the video. the style of the video is a portrait, focusing on the woman and her attire.
Loading