Skip to content

A ComfyUI node that allows you to select Flash Attention Triton implementation as sampling attention.

License

LGPL-3.0, GPL-3.0 licenses found

Licenses found

LGPL-3.0
COPYING.LESSER
GPL-3.0
COPYING
Notifications You must be signed in to change notification settings

ardfork/ComfyUI-flash-attention-triton

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ComfyUI Flash Attention Triton

A ComfyUI node that allows you to select Flash Attention Triton implementation as sampling attention.

This implementation is approximately 20% slower than sub-quadratic attention on tested hardware, but uses less VRAM.

Performance Comparison

Testing on a RX 6700 XT, generating a 1024x1024 image with FLUX.1-dev q4_K-S:

Attention Method VRAM Usage Speed (s/it)
Flash Attention Triton 8.2 GB 28.60
Sub-Quadratic 9.4 GB 23.23

Notes

  • Currently, only implemented for sampling, not for VAE.
  • Only compatible with FLUX models at the moment.

About

A ComfyUI node that allows you to select Flash Attention Triton implementation as sampling attention.

Resources

License

LGPL-3.0, GPL-3.0 licenses found

Licenses found

LGPL-3.0
COPYING.LESSER
GPL-3.0
COPYING

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages