inspired by: https://twitter.com/mzhangio/status/1549054790868430850
instead of finetuning all of the diffusion weights, just throw in some adapters.
target the representations of the conditioning tokens and the LDM latent. Pretty sure this translates to an adapter on either end of the LDM unet.