Add xformers support #20

xuzhao9 · 2024-10-28T19:34:52Z

Add xformers built on source code, similar to fbgemm: https://github.com/facebookresearch/xformers

Make sure fa3 is available.

antferdom · 2024-10-28T23:20:26Z

Simple op availability assertion:

op = xformers.ops.fmha.flash3.FwOp
if op.is_available():
    print(f"xformers_ops_fmha_flash3 supported: {HAS_FLASH}")

References

xformers flash3 dispatch logic
python -m xformers.info

memory_efficient_attention.fa3F@0.0.0:             available
memory_efficient_attention.fa3B@0.0.0:             available

xuzhao9 · 2024-10-29T01:39:09Z

#23 should fix this

antferdom · 2024-10-29T11:25:24Z

Looks good to me, but xformers build from sources with FA3 support might trigger recompilation in the existing environment and overlap with previous Flash Attention v3 installation.

Me and a colleague @ohwi, found a point of conflict between xformers FA3 Torch custom op wrapper logic and flashattn_hopper_cuda, which led to CUDA errors:

TypeError: fwd(): incompatible function arguments. The following argument types are supported:                                                                                                     1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: Optional[torch.Tensor], arg4: float, arg5: Optional[torch.Tensor], arg6: Optional[torch.Tensor], arg7: Optional[torch.Tensor], arg8: bool, arg9: int, arg10: int) -> list[torch.Tensor]

Our understanding of the conflict:

The current version of fwd function in flashattn_hopper_cuda requires non-optional arguments window_size_left and window_size_right, but xformer registered custom `mha_fwd does not include this update.

And there is a code block in xformers that import flashattn_hopper_cuda as a fallback. This makes only one of xformers or flash-attn available.
See: https://github.com/Dao-AILab/flash-attention/blob/main/hopper/flash_api.cpp#L463-L475
and
https://github.com/facebookresearch/xformers/blob/68b7fd14df5eb1d2558c52842b4206a14d2d20e9/xformers/ops/fmha/flash3.py#L48-L82

Therefore, although xformers prints FLASH3 as available operator, we need to further assert its execution. I made it work with
flashattn-hopper==3.0.0b1
torch==2.4.1+cu124
xformers==0.0.29

This consideration might be worth creating a proper issue in xformers repo, what do you think @xuzhao9?

xuzhao9 · 2024-10-29T15:35:58Z

Yes I think it is a valid issue to post to the xformers repo @antferdom

xuzhao9 · 2024-11-20T23:02:12Z

I took a further look at xformers code, since we are compiling it from source, I think it should use the xformers._C_flashattention3 plugin at https://github.com/facebookresearch/xformers/blob/6e10bd21ac6fc878657b24684723ccd05e41d385/setup.py#L321C19-L321C46 and therefore not fallback to the pre-installed flashattn_hopper_cuda package. We should treat them separately.

xuzhao9 · 2024-11-21T00:25:00Z

Actually the xformers FA3 build is broken, submitted: facebookresearch/xformers#1157

After this patch, we can install xformer+FA3 and FA3 as separate kernel impls for flash_attention and they won't conflict.

ohwi · 2024-11-21T00:30:12Z

Hi. I'm Hwigeon, working with @antferdom.

If my memory serves me correctly (apologies, it's been a month since I last tried this), I moved to a newer version of FA with fallback because I also couldn't compile FA with xFormers, which led to the problem mentioned above.

xuzhao9 · 2024-11-21T00:31:42Z

@ohwi Can you please try with facebookresearch/xformers#1157 to see if FA works with xformers?

ohwi · 2024-11-21T00:35:16Z

Sure, I will try with your solution in a day

antferdom · 2024-11-21T17:10:42Z

Actually the xformers FA3 build is broken, submitted: facebookresearch/xformers#1157
After this patch, we can install xformer+FA3 and FA3 as separate kernel impls for flash_attention and they won't conflict.

Are we going to use libraries=["cuda"] patch workaround until upstream FA3 fix and upgrade CUTLASS? Should we then forward this issue to FlashAttention repo? @lw explained it with great detail showcasing the underlying problematic beyond what we thought. Will try with @ohwi

xuzhao9 · 2024-11-21T17:46:32Z

@antferdom Yes, we are going to patch xformers until it updates FA3 and CUTLASS, see #61

Linking to CUDA driver is not a problem for us because we have fine-grained control over the CI infra. At compile time, we will first install the NVIDIA driver package, build xformers/FA3 with libcuda link, then purge the driver files. The driver files will be mapped on the H100 CI runners at test/benchmark time.

Compile time: https://github.com/pytorch-labs/tritonbench/blob/main/docker/tritonbench-nightly.dockerfile#L47
Run time: https://github.com/pytorch-labs/tritonbench/blob/main/docker/infra/values.yaml#L227

antferdom · 2024-11-21T18:39:16Z

@xuzhao9 alright, thanks for the clarification. It does makes sense for us as well, since we have complete control over the machines. I have use Tritonbench Dockerfile as main reference for us to have almost identical environment as yours, and can validate that trying your patch, xformers works:

$ python -c "import xformers._C_flashattention3"

xuzhao9 mentioned this issue Oct 28, 2024

[Installation][non-reproducible]: Op Flash Attention #17

Closed

xuzhao9 mentioned this issue Nov 21, 2024

Patch xformers to enable FA3 extension #61

Closed

facebook-github-bot closed this as completed in abb7ac6 Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xformers support #20

Add xformers support #20

xuzhao9 commented Oct 28, 2024

antferdom commented Oct 28, 2024

xuzhao9 commented Oct 29, 2024

antferdom commented Oct 29, 2024 •

edited

Loading

xuzhao9 commented Oct 29, 2024

xuzhao9 commented Nov 20, 2024 •

edited

Loading

xuzhao9 commented Nov 21, 2024 •

edited

Loading

ohwi commented Nov 21, 2024 •

edited

Loading

xuzhao9 commented Nov 21, 2024

ohwi commented Nov 21, 2024

antferdom commented Nov 21, 2024

xuzhao9 commented Nov 21, 2024 •

edited

Loading

antferdom commented Nov 21, 2024

Add xformers support #20

Add xformers support #20

Comments

xuzhao9 commented Oct 28, 2024

antferdom commented Oct 28, 2024

References

xuzhao9 commented Oct 29, 2024

antferdom commented Oct 29, 2024 • edited Loading

xuzhao9 commented Oct 29, 2024

xuzhao9 commented Nov 20, 2024 • edited Loading

xuzhao9 commented Nov 21, 2024 • edited Loading

ohwi commented Nov 21, 2024 • edited Loading

xuzhao9 commented Nov 21, 2024

ohwi commented Nov 21, 2024

antferdom commented Nov 21, 2024

xuzhao9 commented Nov 21, 2024 • edited Loading

antferdom commented Nov 21, 2024

antferdom commented Oct 29, 2024 •

edited

Loading

xuzhao9 commented Nov 20, 2024 •

edited

Loading

xuzhao9 commented Nov 21, 2024 •

edited

Loading

ohwi commented Nov 21, 2024 •

edited

Loading

xuzhao9 commented Nov 21, 2024 •

edited

Loading