-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifu20240625 group gemm yewang12 #59
Open
wangye805
wants to merge
25
commits into
dev
Choose a base branch
from
ifu20240625_group_gemm_yewang12
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+38,958
−36,083
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* add attention docs Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * first draft Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak to first draft Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up pictures Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * first draft for review Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add logging info/debug Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix of an SWA message Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use subprocess instaed of os.sys Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up benchmark script Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add example script and update notebook Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweaks Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax/Paddle related comments Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rerun H100 benchmark Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * restrict fp8 tests to sm90+ Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move get_cudnn_version from common to pytorch utils Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
* Initial config test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove linters, fix clang-format Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix clang-format Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix clang-format Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove lint Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Adjust config Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * use config file Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * adjust pylintrc Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * pre-format fixes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python only Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FA module Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update CI configs Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CRLF -> LF Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert accidental formatting changes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * try with sudo Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cpp formatting Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix pylint error properly Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * some review comments Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fixes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add fp8 attn include in the correct file Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * autofix PRs Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Apply formatting Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* A hot fix to disable CE deadlock check Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* subclass DPA with BaseModule and test with test_gpt_checkpointing Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test DPA only Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test save and load Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove debug info Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweaks Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add hook in case core_attention._extra_state is missing Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * check named buffers in BaseModule; remove FP8 scratchpad override function; test FP8 for sm90+ Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes: test size, interval in recipe, named_buffer loop Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move BaseModule from FusedAttention to DPA Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…se_fused (#931) * rm tensor check if the workspace is empty Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * add trust_remote=true for load_dataset() in the mnist test Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
…937) replaced plain C asserts with NVTE_CHECK to avoid unused-variable warnings Signed-off-by: Alp Dener <adener@nvidia.com>
* Add the option to use SM for P2P comm in TP overlap Signed-off-by: Sangkug Lym <slym@nvidia.com> * cleanup Signed-off-by: Sangkug Lym <slym@nvidia.com> * Python formatting with black Signed-off-by: Tim Moon <tmoon@nvidia.com> * Format C++ with clang-format Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/csrc/comm_gemm_overlap.h Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Remove optional UB build leftovers Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rm unused import Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
fix tp_initialized error Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
* simplify offset tensors Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes; tests pass Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix C lint Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with_offset with with_padding Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with_padding with padded Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes after merge Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix for fused attn fwd/bwd calls Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust spacing in docstring Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pytorch tests; fix paddle api Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn_biases Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix AttnFuncWithCP backward Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn with CP Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix paddle Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Release GIL in PyTorch pybind11 functions Signed-off-by: Tim Moon <tmoon@nvidia.com>
* adding option to select only .cpp files in a dir in the build tool * change cmake build path --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
* GroupedGEMM via multi-stream cublas * fix A/B is nullptr while D is not nullptr * add fp8 grouped gemm * register with TorchScript * add the GroupedLinear layer --------- Signed-off-by: Xin Yao <xiny@nvidia.com> Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by: Jiang Shao <jiangs@nvidia.com> Co-authored-by: Qi Zhang <qizhang@nvidia.com> Co-authored-by: Phuong Nguyen <phuonguyen@nvidia.com>
… Ignore MVTE_FLASH_ATTN env till FA is enabled for ROCm
Fix typo when selecting tuned RMSNorm kernels Signed-off-by: Tim Moon <tmoon@nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: