-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Opensora-PKU] v1.2.0 training and inference #647
Open
wtomin
wants to merge
165
commits into
mindspore-lab:master
Choose a base branch
from
wtomin:op-v1.2-diffusers
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 137 commits
Commits
Show all changes
165 commits
Select commit
Hold shift + click to select a range
77fc28b
init
wtomin b64dc96
fix sample_t2v
wtomin 8401b59
revise causalvae
wtomin 3d3c3be
vae reconstruction revision
wtomin c3d3b70
vae reconstruction updates
wtomin ad31b63
vae two more modules
wtomin 9ccfe5c
vae inference
wtomin 296d6e5
fix vae inference error
wtomin b51c69a
fix causal conv1d padding
wtomin 1a1eba0
rope and pos_embed
wtomin c1b4deb
revise attention
wtomin 7f7af40
update opensora_t2v
wtomin 499e8a7
update samplet2v
wtomin b4e0fa2
convert opensora t2v ckpt
wtomin 1eb3da3
fix errors
wtomin 6b19bf6
load vae differently
wtomin dde36ac
load vae differently
wtomin 401dd35
fix sample_t2v errors
wtomin 93617d9
updates
wtomin 69e23f9
test opensora
wtomin 20b661e
remove redundant modules
wtomin 882063b
put pixarttimesteps to fp32
wtomin 30367a6
use diffusers attn_processor
wtomin 9a1ec89
allow attention_mode
wtomin e229464
fix ms flash-attention
wtomin 8bc9408
remove enable_flashattention
wtomin 608a76a
test-rope
wtomin 42f6385
fix rope position error
wtomin b8f1f15
fix enable_flash_attention
wtomin 1525916
conver pytorch bin
wtomin c6334d7
fix graph error
wtomin f2a9f59
adapt to graph mode
wtomin 562063f
remove text encoder and opensorat2v model conversion scripts
wtomin 8f2c352
train init
wtomin 47649fd
remove test scripts
wtomin 5be23aa
correct inference error
wtomin e458559
support t1v and t2i sampling
wtomin 44a50c7
update prompts
wtomin 8e17248
compatible ckpt loading
wtomin 8fc56b0
allow custom fp32 cells for text encoder
wtomin b4dec83
correct loading error and path error
wtomin ef7c660
update train script
wtomin 8417563
update sample_text_embed
wtomin c150652
initial training updates
wtomin 77a542e
alter filter_prefix name
wtomin f6c4fc7
updates
wtomin 18b4687
handle bf16 torch weight
wtomin 93120b9
adapt to graph mode
wtomin 1fb1f2f
a torch-like dataloader
wtomin 71e106b
collate_fn
wtomin 0e4f148
update datasets
wtomin 7a0d075
udpate train dataset in train.py
wtomin a10ccf4
test dataset
wtomin 9a2fdf9
update test
wtomin c6ad29a
updates
wtomin ece842b
use batch sampler
wtomin fb40716
change transform func
wtomin 29bc734
update text utils
wtomin 2eda37a
update Collate fn
wtomin 5cafed1
adapt to return text_embed
wtomin b51bec1
set dataloder len
wtomin 053cb69
adapt text dataset
wtomin 6c0c0ce
merge data
wtomin cfce529
remove redundant
wtomin 9b71a43
sequence parallel sampling
wtomin e9b2f88
text encoder dtype to fp16
wtomin 92f6bab
correct text dataset error
wtomin d6561ac
turn pos_embed as tensor not parameter
wtomin f3d3340
text encoder mindspore_dtype to fp16
wtomin 404ef81
change t2i cfg to 4.5
wtomin c49b036
update training data
wtomin 4b17530
update netwithloss args order
wtomin 00ef555
allow text embed
wtomin 4653557
jit_syntax_level
wtomin 266bbda
test transform_al
wtomin 865ca78
replace transform by albumentations
wtomin 723e0e2
edit dtype and replace causalconv3d cat ops
wtomin ee3c3fc
refine printing
wtomin 87a36c4
allow rope2d to compute inv_freq ahead
wtomin c0c38fb
replace mint.repeat
wtomin 15a8db0
correct rope error
wtomin c06e9d2
update training script
wtomin c53924c
edit vae fp32 cell list
wtomin f3dc7a5
vae inference: remove gn from fp32 cells list by default
wtomin 9808abf
use kbk for training
wtomin 7cf00d5
compute inv_freq in init function using np.float64
wtomin b9040f5
amp: replace PixArtAlphaCombinedTimestepSizeEmbeddings by diffusers silu
wtomin c7bb3a6
update readme
wtomin 6271ef3
update demo
wtomin 83de3e1
update ddp scripts
wtomin 1c4dff8
update args
wtomin 283947a
revise model ckpt conversion logic
wtomin 18b4ea0
update sampling
wtomin 168ef12
support ema offloading
wtomin d443025
support ema offload
wtomin ba5befe
load from pretrained
wtomin aac8882
update training script
wtomin fd4dfac
updates
wtomin 16e71c2
huiyao's edits on seq parallel
wtomin 6430dd8
sp inference pipeline
wtomin ceeaaa6
sp inference script
wtomin 5c2e519
clip_grad True by default
wtomin 719a807
refine readme
wtomin 385c8c7
update readm
wtomin 260e7da
zero optimizer from zhaoting
wtomin 7171810
adapt to zero1
wtomin 75aecad
training script zero2
wtomin f4d3cb5
revise zero2 script
wtomin 83136de
revert opensora pipeline changes
wtomin c9e5e39
allow mindcv optimizer as an option
wtomin d0b923d
update sampling scripts names
wtomin f82e8ac
update sample t2v with ms_checkpoint
wtomin 32f0a53
edits from huiyao
wtomin 04e0221
change dataset to mixkit
wtomin 8d364c6
use lr=1e-5
wtomin 1ec3959
fix name error of text embed path
wtomin 3df7549
fix lr print
wtomin 8b4e57a
remove video ddp and use zero2
wtomin 0e37439
use StopAtStepCallback
wtomin 59526d3
test dataset
wtomin db6c16a
remove infinite dataloader
wtomin ee7dba7
do not support drop_last
wtomin 8269133
sample t2v default save as mp4 file
wtomin 13c57ad
make ops.equal input the same size
wtomin 9ad5f89
test ms text encoder
wtomin 72bef80
remove torch dependency
wtomin 42dd49e
sp_sampling
wtomin b865d7e
sp support
wtomin a99af90
update zero helper check
wtomin decaa17
support sp training
wtomin 6eae031
revert rebase changes to mindone
wtomin 379d4ec
update scripts
wtomin f90ae65
update performance table
wtomin aabc3ea
updates
wtomin 42cf8b8
set syntax level to lax
wtomin 79668cd
updates
wtomin 960acf8
solve loss not printing
wtomin d879ee1
delete bak code
wtomin 770a537
delete copyright and revise max_row_size
wtomin 8930b34
no precision_mode by default
wtomin ea8ec20
change mindcv.adamw to adamw_re
wtomin 2cc4571
remove incorrect info in readme
wtomin 4d13c55
correct typo
wtomin 34174d4
update 29x480p speed
wtomin d0a27cf
ema adaptation
wtomin 8bde5f0
Revert "ema adaptation"
wtomin 2c9129b
update 29x720p training script and speed
wtomin e72f7dc
update 93x720p
wtomin 35110b5
revise learning rate to 5e-5
wtomin 1bdb8f8
speed update
wtomin d31bcf5
use adamw_re from mindcv as the default optimizer
wtomin e00d3c8
use variant ema_decay
wtomin f8a7a1e
update printing message
wtomin c62bf02
update cann hyperlink
wtomin 74ee0ff
boolean on multiple elements is ambiguous
wtomin 65022a7
fix bug to use grad_accumulation with zero2
wtomin 8e85bc5
correct table header typo
wtomin 90ef9b9
fix typo
wtomin e91267b
update causalvae training
wtomin 6946e70
val during training
wtomin ace2aa6
update metric fn
wtomin 2813f53
val during training
wtomin 8bd3e53
align LR to 1e-4
wtomin 7b46343
support zero3 and graph mode
wtomin a61718b
correct typo
wtomin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
38 changes: 0 additions & 38 deletions
38
examples/opensora_pku/LanguageBind/Open-Sora-Plan-v1.1.0/17x512x512/config.json
This file was deleted.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,20 @@ | ||
Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field. | ||
A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors. | ||
Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image. | ||
A time-lapse of a storm forming over the ocean, dark clouds gathering and lightning flashing. The storm's energy creates spirals of light that dance across the sky. | ||
A majestic eagle perches on a high cliff, its keen eyes scanning the valley below. With a powerful flap, it takes off, leaving a trail of sparkling feathers. | ||
A single butterfly with wings that resemble stained glass flutters through a field of flowers. The shot captures the light as it passes through the delicate wings, creating a vibrant, colorful display. HD. | ||
A solitary mermaid swims through an underwater cave filled with glowing crystals. The shot follows her graceful movements, capturing the play of light on her scales and the ethereal beauty of the cave. | ||
Close-up of a dragon's eye as it slowly opens, revealing a fiery iris that reflects the burning landscape around it, while smoke wisps off its scaly eyelid. | ||
A cat with the enigmatic smile of the Mona Lisa, lounging regally on a velvet cushion, her eyes following a fluttering butterfly that mirrors the mysterious allure of her expression. 4K. | ||
A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures. | ||
This close-up shot of a Victoria crowned pigeon showcases its striking blue plumage and red chest. Its crest is made of delicate, lacy feathers, while its eye is a striking red color. The bird’s head is tilted slightly to the side, giving the impression of it looking regal and majestic. The background is blurred, drawing attention to the bird’s striking appearance. | ||
a cat wearing sunglasses and working as a lifeguard at pool. | ||
A young man at his 20s is sitting on a piece of cloud in the sky, reading a book. | ||
An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt, he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film. | ||
A lone figure stands on the deck of a spaceship, looking out at a nebula filled with vibrant colors. The shot tracks their gaze, capturing the breathtaking beauty of the cosmic landscape and the sense of infinite possibility. | ||
A large orange octopus is seen resting on the bottom of the ocean floor, blending in with the sandy and rocky terrain. lts tentacles are spread out around its body, and its eyes are closed. The octopus is unaware of a king crab that is crawling towards it from behind a rock,its claws raised and ready to attack. The crab is brown and spiny,with long legs and antennae. The scene is captured from a wide angle,showing the vastness and depth of the ocean. The wateris clear and blue, with rays of sunlight filtering through. The shot is sharp and crisp, with a high dynamic range. The octopus and the crab are in focus, while the background is slightly blurred,creating a depth of field effect. | ||
a dynamic interaction between the ocean and a large rock. The rock, with its rough texture and jagged edges, is partially submerged in the water, suggesting it is a natural feature of the coastline. The water around the rock is in motion, with white foam and waves crashing against the rock, indicating the force of the ocean's movement. The background is a vast expanse of the ocean, with small ripples and waves, suggesting a moderate sea state. The overall style of the scene is a realistic depiction of a natural landscape, with a focus on the interplay between the rock and the water. | ||
A close-up of a woman’s face, illuminated by the soft light of dawn, her expression serene and content as she wakes up in a cozy bedroom. | ||
An intense close-up of a detective’s face, lit by a single desk lamp, his eyes scanning a wall covered in photos and notes, deep in thought. | ||
Audience members in a theater are captured in a series of medium shots, with a young man and woman in formal attire centrally positioned and illuminated by a spotlight effect. | ||
A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea. Seabirds can be seen taking flight around the cliff's precipices. As the drone slowly moves from different angles, the changing sunlight casts shifting shadows that highlight the rugged textures of the cliff and the surrounding calm sea. The water gently laps at the rock base and the greenery that clings to the top of the cliff, and the scene gives a sense of peaceful isolation at the fringes of the ocean. The video captures the essence of pristine natural beauty untouched by human structures. | ||
A vibrant scene of a snowy mountain landscape. The sky is filled with a multitude of colorful hot air balloons, each floating at different heights, creating a dynamic and lively atmosphere. The balloons are scattered across the sky, some closer to the viewer, others further away, adding depth to the scene. Below, the mountainous terrain is blanketed in a thick layer of snow, with a few patches of bare earth visible here and there. The snow-covered mountains provide a stark contrast to the colorful balloons, enhancing the visual appeal of the scene. | ||
A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene. | ||
A snowy forest landscape with a dirt road running through it. The road is flanked by trees covered in snow, and the ground is also covered in snow. The sun is shining, creating a bright and serene atmosphere. The road appears to be empty, and there are no people or animals visible in the video. The style of the video is a natural landscape shot, with a focus on the beauty of the snowy forest and the peacefulness of the road. | ||
The dynamic movement of tall, wispy grasses swaying in the wind. The sky above is filled with clouds, creating a dramatic backdrop. The sunlight pierces through the clouds, casting a warm glow on the scene. The grasses are a mix of green and brown, indicating a change in seasons. The overall style of the video is naturalistic, capturing the beauty of the landscape in a realistic manner. The focus is on the grasses and their movement, with the sky serving as a secondary element. The video does not contain any human or animal elements. | ||
A close-up of a magician’s crystal ball that reveals a futuristic cityscape within. Skyscrapers of light stretch towards the heavens, and flying cars zip through the air, casting neon reflections across the ball’s surface. 8K. | ||
A majestic horse gallops across a bridge made of rainbows, each hoof striking sparks of color that cascade into the sky, the clouds parting to reveal a sunlit path to a distant, magical realm. | ||
A close-up of a robot dog as it interacts with a group of real puppies in a park, its mechanical eyes blinking with curiosity and tail wagging energetically. High Resolution. | ||
An elderly woman with white hair and a lined face is seated inside an older model car, looking out through the side window with a contemplative or mildly sad expression. | ||
a realistic 3d rendering of a female character with curly blonde hair and blue eyes. she is wearing a black tank top and has a neutral expression while facing the camera directly. the background is a plain blue sky, and the scene is devoid of any other objects or text. the character is detailed, with realistic textures and lighting, suitable for a video game or high-quality animation. there is no movement or additional action in the video. the focus is entirely on the character's appearance and realistic rendering. | ||
A panda strumming a guitar under a bamboo grove, its paws gently plucking the strings as a group of mesmerized rabbits watch, the music blending with the rustle of bamboo leaves. HD. | ||
A close-up of a woman with a vintage hairstyle and bright red lipstick, gazing seductively into the camera, the background blurred to keep the focus solely on her. | ||
In the jungle, a hidden temple stands guarded by statues of lions, their eyes glowing with emerald light, protecting secrets untold for millennia. 8K. | ||
A close-up of an old man’s weathered face, with deep wrinkles and a thick white mustache, looking out to sea, the wind gently blowing through his hair. | ||
An intense close-up of a soldier’s face, covered in dirt and sweat, his eyes filled with determination as he surveys the battlefield. | ||
A river that flows uphill, defying gravity as it returns lost treasures from the sea to the mountain top, each item telling a story of a voyage gone by. HD. | ||
A close-up of a man’s face, lit only by the glow of his computer screen, his eyes wide and unblinking as he discovers something shocking online. | ||
On a deserted island, palm trees sway to summon a rainstorm, their leaves conducting the wind like maestros, orchestrating a symphony of thunder and lightning. High Resolution. | ||
An extreme close-up of a middle-aged man’s face, with a five o’clock shadow, staring pensively into the distance as rain softly taps against the window beside him, his thoughts deep and contemplative. | ||
A close-up of a man’s face, his expression one of deep concentration as he works on a complex task. | ||
Drone view of waves crashing against the rugged cliffs along Big Sur's garay point beach.The crashing blue waters create white-tipped waves,while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green | ||
shrubbery covers the cliffs edge. The steep drop from the road down to the beach is adramatic feat, with the cliff's edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway. | ||
a close-up shot of a woman standing in a dimly lit room. she is wearing a traditional chinese outfit, which includes a red and gold dress with intricate designs and a matching headpiece. the woman has her hair styled in an updo, adorned with a gold accessory. her makeup is done in a way that accentuates her features, with red lipstick and dark eyeshadow. she is looking directly at the camera with a neutral expression. the room has a rustic feel, with wooden beams and a stone wall visible in the background. the lighting in the room is soft and warm, creating a contrast with the woman's vibrant attire. there are no texts or other objects in the video. the style of the video is a portrait, focusing on the woman and her attire. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Torch repo has removed this file too.