Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: hpcai opensora 1.2 - VAE 3D training #621

Merged
merged 29 commits into from
Sep 27, 2024

Conversation

SamitHuang
Copy link
Collaborator

@SamitHuang SamitHuang commented Aug 1, 2024

What does this PR do?

Fixes # (issue)

Adds # (feature)
VAE 3D training for hpcai opensora 1.2 including:

  • 3 stage training with different loss config
  • mixed video and image training

Evaluation are done on UCF-101 dataset, resulting in PSNR 29

TODOs:

  • train with random number of frames. Dynamic shape support will be done in next PR

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

@@ -51,7 +51,7 @@ def encode_with_moments_output(self, x):
"""For latent caching usage"""
h = self.encoder(x)
moments = self.quant_conv(h)
mean, logvar = self.split(moments, moments.shape[1] // 2, 1)
mean, logvar = mint.split(moments, moments.shape[1] // 2, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why must be mint?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference with tools/convert_vae_3d.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm redundant file

scheduler: "constant"
use_ema: False

output_path: "outputs/vae_stage2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output_path: "outputs/vae_stage2"
output_path: "outputs/vae_stage3"


### Data Preprocess
If you want to train your own VAE, we need to prepare data in the csv following the [data processing](#data-processing) pipeline, then run the following commands.
Note that you need to adjust the number of trained epochs (`epochs`) in the config file accordingly with respect to your own csv data size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的意思是在数据处理的时候设置epoch size吗 是否可以在实际训练的时候再repeat

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 825 to 827
| Model | Context | jit_level | Precision | BS | NPUs | Resolution(framesxHxW) | Train T. (s/step) | PSNR | SSIM |
|:------------|:-------------|:--------|:---------:|:--:|:----:|:----------------------:|:-----------------:|:-----------------:|:-----------------:|
| STDiT2-XL/2 | D910\*-[CANN C18(0705)](https://repo.mindspore.cn/ascend/ascend910/20240705/)-[MS2.3](https://www.mindspore.cn/install) | O1 | BF16 | 1 | 8 | 17x256x256 | 0.97 | 29.29 | 0.88 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1、3个stage的性能如果有的话可以一起加一下
2、并行策略和datasink有用到的话建议也加一下

| STDiT2-XL/2 | D910\*-[CANN C18(0705)](https://repo.mindspore.cn/ascend/ascend910/20240705/)-[MS2.3](https://www.mindspore.cn/install) | O1 | BF16 | 1 | 8 | 17x256x256 | 0.97 | 29.29 | 0.88 |
> Context: {G:GPU, D:Ascend}{chip type}-{mindspore version}.

Note that we train with mixed video ang image strategy i.e. `--mixed_strategy=mixed_video_image` for stage 3 instead of random number of frames (`mixed_video_random`). Random frame training will be supported in the future.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--mixed_strategy 这个参数感觉有些不清晰,感觉没有表达出 video/image sample stretagy 的含义

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实,目前是对齐torch的参数名

Comment on lines +17 to +18
csv_path: "../videocomposer/datasets/webvid5_copy.csv"
video_folder: "../videocomposer/datasets/webvid5"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

引用上级vc感觉有点奇怪,是否可以cp到当前目录

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

视频文件比较大,避免增大repo

@@ -18,3 +18,4 @@ tokenizers
sentencepiece
transformers
pyav
mindcv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否要指定特定版本

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 3 to 4
# dynamic shape acceleration
export MS_DEV_ENABLE_KERNEL_PACKET=on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个参数的做用是啥 是否动态shape下都需要打开

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 9 to 14
# --ckpt_path models/OpenSora-VAE-v1.2/model.ckpt \
# --ckpt_path outputs/vae_stage2.ckpt \
# --device_target GPU \
# --crop_size 256 \
# --ckpt_path /home/mindocr/yx/mindone/examples/opensora_hpcai/models/v1.2/vae.ckpt \
# --ckpt_path /home/mindocr/yx/mindone/examples/opensora_hpcai/models/sd-vae-ft-ema.ckpt \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缩紧建议可以规整下

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm

@SamitHuang SamitHuang added this pull request to the merge queue Sep 27, 2024
Merged via the queue into mindspore-lab:master with commit c40d733 Sep 27, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants