-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMDiT implementation and text-to-image training with rectified flows #155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few commends about structure. Basically, I'd like to see more of the transformer logic confined to the transformer with the ComposerModel
getting as close to "create the 3 models and call them" as possible. Overall, it's awesome and I conditionally approve as it's non-breaking and also has successful test runs.
This PR contains an implementation of the MMDiT model from the SD3 paper, along with a model class for using it to train text to image models. To support this, a generic model inference class is also included.
Major additions:
diffusion/inference/inference_model.py
hasModelnference
class for inference with arbitrary models frommodels.py
diffusion/models/models.py
Includes atext_to_image_transformer
model for SD3 style MMDiTdiffusion/models/t2i_transformer.py
has theComposerModel
class for the MMDiT text to image modeldiffusion/models/transformer.py
has the layers/blocks for the MMDiT modeldiffusion/train.py
includes a new function to configure the optimizer for the new text to image model