Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FA3 Tracking #11

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

FA3 Tracking #11

wants to merge 1 commit into from

Conversation

TJ-Solergibert
Copy link
Collaborator

In this branch, we will track the evolution of FA3. The current state is:

  • No MQA/GQA
  • No BF16
  • Requires contiguous inputs

From the flash-attention repo:

  • Coming soon in the next couple of days / next week:
  • BF16
  • Variable length (FP16, BF16)
  • FP8 forward

Installation

Refer to the official FA repo.
This will install the package flashattn-hopper, so you can still have flash_attnfor the LayerNorm and RoPE embeddings.

Run experiments

In this branch, I have added the configuration to choose between FA2 & FA3 to more easily compare the performance of both (model.model_config.use_fa3). You can use the configuration examples/config_llama3_fa3.yaml that will build a Llama3-8B model but with fewer decoder layers to fit the 8192 sequence length in 1 GPU.

If we use GH200 nodes, having more VRAM will allow us to use num_hidden_layers = 11. In systems with H100, use num_hidden_layers = 8. Don't forget to edit the dataset_folder and tokenizer_name_or_path fields if necessary.

Performance

I will keep updating this table as new features are incorporated, as they mentioned, they are currently in a beta release. The MFU reported is the one computed by nanotron.

Date FA Precision num_attention_heads num_key_value_heads num_hidden_layers Batch Size Sequence Length MFU (TFLOPs) VRAM
12/7 2 bf16 32 32 11 1 8192 348 94
12/7 2 fp16 32 32 11 1 8192 381 94
12/7 3 fp16 32 32 11 1 8192 446 94
12/7 2 bf16 32 32 8 1 8192 342 77
12/7 2 fp16 32 32 8 1 8192 370 77
12/7 3 fp16 32 32 8 1 8192 433 77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant