fp8 implementation of flux which gets ~3.5 it/s 1024x1024 on 4090 (ADA / Hopper & 16GB vram+ only) #1363
Replies: 1 comment 1 reply
-
This looks promising and super interesting. Unfortunately, I do not have Ada/40XX device to develop webui now, so I have no idea how that feels and looks like. I have some 40xx devices but that are for labs or something and are not my personal dev setup for webui. To play with this in Forge, you need to wait me to somehow get an Ada/40XX device for my personal dev setups. But feel free to post some images here and so that I can take a look what level of quality degradation is. And, I am especially interested in the influence of aredden's range scaling methods. In fact, I also have some ideas to port native 8bit bnb operations to compute layers, but that highly depends on if I have free time later. |
Beta Was this translation helpful? Give feedback.
-
Could this be implemented?
https://github.com/aredden/flux-fp8-api?tab=readme-ov-file#installation
https://www.reddit.com/r/StableDiffusion/comments/1ex64jj/i_made_an_fp8_implementation_of_flux_which_gets/
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Credits to aredden
Beta Was this translation helpful? Give feedback.
All reactions