[Update] diffusers v0.29.2 Update #650

townwish4git · 2024-08-30T06:58:47Z

What does this PR do?

Description

This pull request serves as a preliminary submission for integrating the diffusers library to version v0.29.2. It is intentionally marked as a work-in-progress (WIP) and should not be merged into the main branch until specific criteria are met. This early merge request aims to streamline future development processes by initiating code review and allowing for parallel testing.

Merge Criteria:

Legacy Module Non-Degradation: Conduct comprehensive tests to verify that existing modules maintain their performance post-update, with no signs of degradation.
- Models Unittest
- Pipelines Outputs Validation by @The-truthh
New Module Validation: Ensure all new components introduced in this update undergo thorough comparative validation using PyTorch, confirming their functionality and performance.
- Models Unittest
- New Pipelines Outputs: Update inner validation report by @townwish4git
Transformer Dependency Update: Await the integration of transformers' BERT model by @Cui-yshoho into the repository. This dependency upgrade is crucial for compatibility and feature completeness.

Action Items:

Developers and reviewers are kindly requested to focus on reviewing the changes without merging until the above conditions are satisfied.
feat(transformers/models): add Bert #645 : the completion of the transformers.bert model integration as per the roadmap.

Once these milestones are achieved, this PR will be ready for final review and formal integration, setting a solid foundation for the upcoming v0.29.2 release.

Please note, this PR is part of the preparatory phase and requires subsequent validation steps to ensure quality and stability before final acceptance.

Features

New models/pipelines

1. Marigold

Proposed in Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, Marigold introduces a diffusion model and associated fine-tuning protocol for monocular depth estimation. It can also be extended to perform surface normals’ estimation.

2. PixArt-Sigma

PixArt Simga is the successor to PixArt Alpha. PixArt Sigma is capable of directly generating images at 4K resolution. It can also produce images of markedly higher fidelity and improved alignment with text prompts. It comes with a massive sequence length of 300 (for reference, PixArt Alpha has a maximum sequence length of 120)!

3. AnimateDiff SDXL

a-r-r-o-w contributed the Stable Diffusion XL (SDXL) version of AnimateDiff. However, note that this is currently an experimental feature, as only a beta release of the motion adapter checkpoint is available.

4. Hunyuan DiT

Hunyuan DiT is a transformer-based diffusion pipeline, introduced in the Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding paper by the Tencent Hunyuan.

5. StableDiffusion3

This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in Scaling Rectified Flow Transformers for High-Resolution Image Synthesis by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach.

ControlNets

1. ControlNetXS

ControlNet-XS was introduced in ControlNet-XS by Denis Zavadski and Carsten Rother. Based on the observation, the control model in the original ControlNet can be made much smaller and still produce good results. ControlNet-XS generates images comparable to a regular ControlNet, but it is 20-25% faster (see benchmark with StableDiffusion-XL) and uses ~45% less memory.

ControlNet-XS is supported for both Stable Diffusion and Stable Diffusion.

2. SD3 CntrolNet

More

1. Massive Refactor of from_single_file

We have further refactored from_single_file to align its logic more closely to the from_pretrained method. The biggest benefit of doing this is that it allows us to expand single file loading support beyond Stable Diffusion-like pipelines and models. It also makes it easier to load models that are saved and shared in their original format.

2. Using Long Prompts with the T5 Text Encoder

We increased the default sequence length for the T5 Text Encoder from a maximum of 77 to 256! It can be adjusted to accept fewer or more tokens by setting the max_sequence_length to a maximum of 512. Keep in mind that longer sequences require additional resources and will result in longer generation times. This effect is particularly noticeable during batch inference.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

geniuspatrick · 2024-09-23T08:15:02Z

@vigo999 will take over this version upgrade.

…fuionPipeline

… from StableDiffusionPipeline

townwish4git requested a review from geniuspatrick as a code owner August 30, 2024 06:58

townwish4git added 5 commits September 2, 2024 17:11

feat(diffusers/utils): update utils to v0.29

58c1c6a

feat(diffusers/loaders): update loaders to v0.29

b14ef56

feat(diffusers/models): update models to v0.29

46a4ed6

feat(diffusers/pipelines): update pipelines to v0.29

68e0b70

Updata(diffusers): v0.29.2

0c87422

townwish4git force-pushed the diffusersv0.29.2 branch from 4fa41fc to 0c87422 Compare September 2, 2024 11:37

townwish4git mentioned this pull request Sep 12, 2024

diffusers 0.30 dev base #663

Open

25 tasks

CaitinZhao approved these changes Sep 13, 2024

View reviewed changes

townwish4git changed the title ~~[WIP] Prepare for diffusers v0.29.2 Update - Prerequisite Integration~~ [Updata] diffusers v0.29.2 Update Sep 14, 2024

townwish4git changed the title ~~[Updata] diffusers v0.29.2 Update~~ [Update] diffusers v0.29.2 Update Sep 14, 2024

townwish4git and others added 5 commits September 19, 2024 21:35

fix(diffusers/pipeline): fix shap-e mesh decoder

39b8403

fix(diffusers/models): fix dtype mismatch in controlnet&adapter

d5ca3f2

fix(diffusers/pipelines): marigold

de30906

fix(diffusers/loaders): fix single_file loading

ae8b5d7

fix(diffusers): temp fix for lora loading with bfloat16 data

0bfaf1c

townwish4git added 6 commits September 23, 2024 16:58

fix(diffusers): fix enable_forward_chunking()

d880307

fix(diffusers): fix encode_image defined in & copied from StableDif…

3842694

…fuionPipeline

fix(diffusers): make UNetMotionModel compatible with ip-adapter

bbfe141

fix(diffusers/models): fix recompute

a4eda03

feat(diffusers/schedulers): update schedulers to v0.29

7fd665d

fix(diffusers/pipelines): fix rescale_noise_cfg defined in & copied…

03187e5

… from StableDiffusionPipeline

townwish4git force-pushed the diffusersv0.29.2 branch from 74c6f90 to 03187e5 Compare September 26, 2024 04:00

vigo999 self-requested a review September 26, 2024 13:24

vigo999 approved these changes Sep 26, 2024

View reviewed changes

geniuspatrick approved these changes Sep 27, 2024

View reviewed changes

vigo999 added this pull request to the merge queue Sep 27, 2024

Merged via the queue into mindspore-lab:master with commit 2ea7619 Sep 27, 2024
3 checks passed

The-truthh mentioned this pull request Sep 30, 2024

docs(diffusers): supplement the docs of diffusers on github page #679

Open

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Update] diffusers v0.29.2 Update #650

[Update] diffusers v0.29.2 Update #650

townwish4git commented Aug 30, 2024 •

edited

Loading

geniuspatrick commented Sep 23, 2024

[Update] diffusers v0.29.2 Update #650

[Update] diffusers v0.29.2 Update #650

Conversation

townwish4git commented Aug 30, 2024 • edited Loading

What does this PR do?

Description

Merge Criteria:

Action Items:

Features

New models/pipelines

1. Marigold

2. PixArt-Sigma

3. AnimateDiff SDXL

4. Hunyuan DiT

5. StableDiffusion3

ControlNets

1. ControlNetXS

2. SD3 CntrolNet

More

1. Massive Refactor of from_single_file

2. Using Long Prompts with the T5 Text Encoder

Before submitting

Who can review?

geniuspatrick commented Sep 23, 2024

townwish4git commented Aug 30, 2024 •

edited

Loading