Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: compilation of flash-attention@howiejay/navi_support fails with missing symbol (ROCm 6.2) #3515

Open
2 tasks done
kingoftanoa opened this issue Oct 24, 2024 · 3 comments
Labels
platform Platform specific problem

Comments

@kingoftanoa
Copy link

kingoftanoa commented Oct 24, 2024

Issue Description

When starting up the current build, sdnext tries to fetch, build and install https://github.com/ROCm/flash-attention@howiejay/navi_support. However, compilation fails. Nominally this is not a regression (since flash attention always failed for ROCM), but seems like it's "more strongly intended to work". The GPU in question is a 7900XTX (aka gfx1100):

[   31.070603] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[   31.101461] amdgpu 0000:13:00.0: enabling device (0006 -> 0007)
[   31.101521] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x164E 0x1043:0x8877 0xCB).
[   31.101549] [drm] register mmio base: 0xF6A00000
[   31.101549] [drm] register mmio size: 524288
[   31.103265] [drm] add ip block number 0 <nv_common>
[   31.103267] [drm] add ip block number 1 <gmc_v10_0>
[   31.103268] [drm] add ip block number 2 <navi10_ih>
[   31.103269] [drm] add ip block number 3 <psp>
[   31.103269] [drm] add ip block number 4 <smu>
[   31.103270] [drm] add ip block number 5 <dm>
[   31.103271] [drm] add ip block number 6 <gfx_v10_0>
[   31.103272] [drm] add ip block number 7 <sdma_v5_2>
[   31.103273] [drm] add ip block number 8 <vcn_v3_0>
[   31.103273] [drm] add ip block number 9 <jpeg_v3_0>
[   31.103289] amdgpu 0000:13:00.0: amdgpu: Fetched VBIOS from VFCT
[   31.103291] amdgpu: ATOM BIOS: 102-RAPHAEL-008

I am unsure if maybe some -dev package or somesuch is missing.

I've searched for the error message and found this:

ROCm/composable_kernel#775

I tried the second patch in ROCm/composable_kernel#775 (comment) but that did not help (now reverted ck.hpp)

Version Platform Description

Debian Trixie, Python 3.11, ROCm 6.2

$ ./webui.sh --config ../config/config.json --ui-config ../config/ui-config.json --models-dir ../models --debug --listen --insecure --use-rocm                                                          
Activate python venv: /mnt/automatic/venv                                                                                                                                                               
Launch: venv/bin/python3                                                                                                                                                                                
16:46:06-597049 INFO     Starting SD.Next                                                                                                                                                               
16:46:06-598450 INFO     Logger: file="/home/jeffk/sdnext/automatic/sdnext.log" level=DEBUG size=64 mode=create                                                                                           
16:46:06-599051 INFO     Python: version=3.11.10 platform=Linux bin="/home/jeffk/sdnext/automatic/venv/bin/python3" venv="/home/jeffk/sdnext/automatic/venv"                                                
16:46:06-607014 INFO     Version: app=sd.next updated=2024-10-23 hash=0d332ca7 branch=master url=https://github.com/vladmandic/automatic/tree/master ui=main                                            
16:46:06-869483 INFO     Platform: arch=x86_64 cpu= system=Linux release=6.11.2-amd64 python=3.11.10
16:46:06-870355 DEBUG    Setting environment tuning                                                 
16:46:06-870868 DEBUG    Torch allocator: "garbage_collection_threshold:0.80,max_split_size_mb:512" 
16:46:06-875367 DEBUG    Torch overrides: cuda=False rocm=True ipex=False directml=False openvino=False zluda=False                                                                                     
16:46:06-875934 INFO     Python: version=3.11.10 platform=Linux bin="/home/jeffk/sdnext/automatic/venv/bin/python3" venv="/home/jeffk/sdnext/automatic/venv"                                                
16:46:06-876373 INFO     ROCm: AMD toolkit detected                                                 
16:46:06-893372 INFO     ROCm: agents=['gfx1100', 'gfx1036']                                        
16:46:06-894002 INFO     ROCm: version=None, using agent gfx1100                  

Relevant log output

16:46:06-895432 DEBUG    Running: pip="install --upgrade git+https://github.com/ROCm/flash-attention@howiejay/navi_support"                                                         16:46:40 [3584/3965]
16:46:40-747566 ERROR    Install: pip: install --upgrade git+https://github.com/ROCm/flash-attention@howiejay/navi_support
16:46:40-748297 DEBUG    Install: pip output Collecting git+https://github.com/ROCm/flash-attention@howiejay/navi_support
                           Cloning https://github.com/ROCm/flash-attention (to revision howiejay/navi_support) to /tmp/pip-req-build-7yqyibia
                           Resolved https://github.com/ROCm/flash-attention to commit b28f18350af92a68bec057875fd486f728c9f084
                           Preparing metadata (setup.py): started
                           Preparing metadata (setup.py): finished with status 'done'
                         Requirement already satisfied: torch in ./venv/lib/python3.11/site-packages (from flash_attn==2.0.4) (2.4.1+rocm6.1)
                         Requirement already satisfied: einops in ./venv/lib/python3.11/site-packages (from flash_attn==2.0.4) (0.4.1)
                         Requirement already satisfied: packaging in ./venv/lib/python3.11/site-packages (from flash_attn==2.0.4) (24.1)
                         Collecting ninja (from flash_attn==2.0.4)
                           Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
                         Requirement already satisfied: filelock in ./venv/lib/python3.11/site-packages (from torch->flash_attn==2.0.4) (3.13.1)
                         Requirement already satisfied: typing-extensions>=4.8.0 in ./venv/lib/python3.11/site-packages (from torch->flash_attn==2.0.4) (4.11.0)
                         Requirement already satisfied: sympy in ./venv/lib/python3.11/site-packages (from torch->flash_attn==2.0.4) (1.12)
                         Requirement already satisfied: networkx in ./venv/lib/python3.11/site-packages (from torch->flash_attn==2.0.4) (3.2.1)
                         Requirement already satisfied: jinja2 in ./venv/lib/python3.11/site-packages (from torch->flash_attn==2.0.4) (3.1.3)
                         Requirement already satisfied: fsspec in ./venv/lib/python3.11/site-packages (from torch->flash_attn==2.0.4) (2024.2.0)
                         Requirement already satisfied: pytorch-triton-rocm==3.0.0 in ./venv/lib/python3.11/site-packages (from torch->flash_attn==2.0.4) (3.0.0)
                         Requirement already satisfied: MarkupSafe>=2.0 in ./venv/lib/python3.11/site-packages (from jinja2->torch->flash_attn==2.0.4) (2.1.5)
                         Requirement already satisfied: mpmath>=0.19 in ./venv/lib/python3.11/site-packages (from sympy->torch->flash_attn==2.0.4) (1.3.0)
                         Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
                            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 kB 8.1 MB/s eta 0:00:00
                         Building wheels for collected packages: flash_attn
                           Building wheel for flash_attn (setup.py): started
                           Building wheel for flash_attn (setup.py): finished with status 'error'
                           Running setup.py clean for flash_attn
                         Failed to build flash_attn

                           Running command git clone --filter=blob:none --quiet https://github.com/ROCm/flash-attention /tmp/pip-req-build-7yqyibia
                           Running command git checkout -b howiejay/navi_support --track origin/howiejay/navi_support
                           Switched to a new branch 'howiejay/navi_support'
                           branch 'howiejay/navi_support' set up to track 'origin/howiejay/navi_support'.
                           Running command git submodule update --init --recursive -q
                           error: subprocess-exited-with-error

                           × python setup.py bdist_wheel did not run successfully.
                           │ exit code: 1
                           ╰─> [589 lines of output]


                               torch.__version__  = 2.4.1+rocm6.1


                               RTZ IS USED
                               /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/ck.hpp ->
                         /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/ck.hpp [skipped, no changes]
                               /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/host_utility/device_prop.hpp ->
[lots more similar lines]
                               [92mSuccessfully preprocessed all matching files.[0m
                               Total number of unsupported CUDA function calls: 0


                               Total number of replaced kernel launches: 10
                               running bdist_wheel
                               running build
                               running build_py
                               creating build
                               creating build/lib.linux-x86_64-cpython-311
                               creating build/lib.linux-x86_64-cpython-311/flash_attn
                               copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn
                               copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-311/flash_attn
                               copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-311/flash_attn
                               copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-311/flash_attn
                               copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-311/flash_attn
                               copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-311/flash_attn
                               copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-311/flash_attn
                               copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-311/flash_attn
                               creating build/lib.linux-x86_64-cpython-311/flash_attn/layers
                               copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
                               copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
                               copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
                               creating build/lib.linux-x86_64-cpython-311/flash_attn/losses
                               copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/losses
                               copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-311/flash_attn/losses
                               creating build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/falcon.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
                               creating build/lib.linux-x86_64-cpython-311/flash_attn/modules
                               copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
                               copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
                               copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
                               copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
                               copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
                               creating build/lib.linux-x86_64-cpython-311/flash_attn/ops
                               copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
                               copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
                               copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
                               copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
                               copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
                               creating build/lib.linux-x86_64-cpython-311/flash_attn/utils
                               copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
                               copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
                               copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
                               copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
                               copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
                               running build_ext
                               building 'flash_attn_2_cuda' extension
                               creating /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311
                               creating /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc
                               creating /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm
                               creating /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src
                               Emitting ninja build file /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/build.ninja...
                               Compiling objects...
                               Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
                               [1/58] /opt/rocm/bin/hipcc  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim128_bf16_noncausal_gfx9x_hip.hip -o
                         /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim128_bf16_noncausal_gfx9x_hip.o -fPIC
                         -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG
                         -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"'
                         '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               [2/58] /opt/rocm/bin/hipcc  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim128_bf16_causal_gfx9x_hip.hip -o
                         /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim128_bf16_causal_gfx9x_hip.o -fPIC
                         -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG
                         -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"'
                         '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
[more compilation steps, all without errors]
                               [50/58] /opt/rocm/bin/hipcc  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/flash_api_hip.hip -o
                         /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/flash_api_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1
                         -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native
                         -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"'
                         -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               FAILED: /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/flash_api_hip.o
                               /opt/rocm/bin/hipcc  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/flash_api_hip.hip -o
                         /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/flash_api_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1
                         -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native
                         -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"'
                         -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/flash_api_hip.hip:14:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_runner_hip.hpp:30:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/fwd_device_gemm_invoker_hip.hpp:27:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/fwd_device_gemm_template_hip.hpp:27:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/device_gemm_trait_hip.hpp:45:
                               In file included from
                         /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_query_attention_forward_wmma_hip.hpp:17:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/tensor_description/tensor_descriptor_hip.hpp:7:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/common_header_hip.hpp:37:
                               /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/amd_buffer_addressing.hpp:32:48: error: use of undeclared identifier
                         'CK_BUFFER_RESOURCE_3RD_DWORD'
                                  32 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
                                     |                                                ^
                               /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/amd_buffer_addressing.hpp:47:48: error: use of undeclared identifier
                         'CK_BUFFER_RESOURCE_3RD_DWORD'
                                  47 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
                                     |                                                ^
                               2 errors generated when compiling for gfx1036.
                               failed to execute:/opt/rocm-6.2.0/lib/llvm/bin/clang++  --offload-arch=native  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c -x hip /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/flash_api_hip.hip -o
                         "/tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/flash_api_hip.o" -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2
                         -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -D__WMMA__
                         -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\"
                         -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               [51/58] /opt/rocm/bin/hipcc  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_bf16_casual_gfx110x_hip.hip -o
                         /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_bf16_casual_gfx110x_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1
                         -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__
                         -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"'
                         '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               FAILED: /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_bf16_casual_gfx110x_hip.o
                               /opt/rocm/bin/hipcc  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_bf16_casual_gfx110x_hip.hip -o
                         /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_bf16_casual_gfx110x_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1
                         -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__
                         -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"'
                         '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_bf16_casual_gfx110x_hip.hip:25:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_runner_hip.hpp:30:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/fwd_device_gemm_invoker_hip.hpp:27:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/fwd_device_gemm_template_hip.hpp:27:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/device_gemm_trait_hip.hpp:45:
                               In file included from
                         /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_query_attention_forward_wmma_hip.hpp:17:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/tensor_description/tensor_descriptor_hip.hpp:7:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/common_header_hip.hpp:37:
                               /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/amd_buffer_addressing.hpp:32:48: error: use of undeclared identifier
                         'CK_BUFFER_RESOURCE_3RD_DWORD'
                                  32 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
                                     |                                                ^
                               /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/amd_buffer_addressing.hpp:47:48: error: use of undeclared identifier
                         'CK_BUFFER_RESOURCE_3RD_DWORD'
                                  47 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
                                     |                                                ^
                               2 errors generated when compiling for gfx1036.
                               failed to execute:/opt/rocm-6.2.0/lib/llvm/bin/clang++  --offload-arch=native  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c -x hip /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_bf16_casual_gfx110x_hip.hip -o
                         "/tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_bf16_casual_gfx110x_hip.o" -fPIC
                         -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG
                         -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\"
                         -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               [52/58] /opt/rocm/bin/hipcc  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_fp16_casual_gfx110x_hip.hip -o
                         /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_fp16_casual_gfx110x_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1
                         -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__
                         -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"'
                         '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               FAILED: /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_fp16_casual_gfx110x_hip.o
                               /opt/rocm/bin/hipcc  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_fp16_casual_gfx110x_hip.hip -o
                         /tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_fp16_casual_gfx110x_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1
                         -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__
                         -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"'
                         '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_fp16_casual_gfx110x_hip.hip:25:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_runner_hip.hpp:30:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/fwd_device_gemm_invoker_hip.hpp:27:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/fwd_device_gemm_template_hip.hpp:27:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/device_gemm_trait_hip.hpp:45:
                               In file included from
                         /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_grouped_query_attention_forward_wmma_hip.hpp:17:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/tensor_description/tensor_descriptor_hip.hpp:7:
                               In file included from /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/common_header_hip.hpp:37:
                               /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/amd_buffer_addressing.hpp:32:48: error: use of undeclared identifier
                         'CK_BUFFER_RESOURCE_3RD_DWORD'
                                  32 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
                                     |                                                ^
                               /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include/ck/utility/amd_buffer_addressing.hpp:47:48: error: use of undeclared identifier
                         'CK_BUFFER_RESOURCE_3RD_DWORD'
                                  47 |     wave_buffer_resource.config(Number<3>{}) = CK_BUFFER_RESOURCE_3RD_DWORD;
                                     |                                                ^
                               2 errors generated when compiling for gfx1036.
                               failed to execute:/opt/rocm-6.2.0/lib/llvm/bin/clang++  --offload-arch=native  -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/include
                         -I/tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/composable_kernel/library/include -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/TH -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THC
                         -I/home/jeffk/sdnext/automatic/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/jeffk/sdnext/automatic/venv/include
                         -I/opt/python3.11.10/include/python3.11 -c -c -x hip /tmp/pip-req-build-7yqyibia/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_fp16_casual_gfx110x_hip.hip -o
                         "/tmp/pip-req-build-7yqyibia/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_rocm/src/flash_fwd_runner_batched_gqa_fp16_casual_gfx110x_hip.o" -fPIC
                         -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG
                         -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -D__WMMA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\"
                         -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc

[more errors, but all about missing CK_BUFFER_RESOURCE_3RD_DWORD
[ninja and Python backtrace etc]

                         ERROR: Could not build wheels for flash_attn, which is required to install pyproject.toml-based projects

[normal startup continues]

Backend

Diffusers

UI

Standard

Branch

Master

Model

StableDiffusion 1.5

Acknowledgements

  • I have read the above and searched for existing issues
  • I confirm that this is classified correctly and its not an extension issue
@vladmandic vladmandic added the platform Platform specific problem label Oct 24, 2024
@Disty0
Copy link
Collaborator

Disty0 commented Oct 24, 2024

16:46:06-893372 INFO     ROCm: agents=['gfx1100', 'gfx1036']  

Try disabling your iGPU gfx1036.

@kingoftanoa
Copy link
Author

16:46:06-893372 INFO     ROCm: agents=['gfx1100', 'gfx1036']  

Try disabling your iGPU gfx1036.

I disabled it in BIOS and that did the trick. Is there also a way to make SDNEXT ignore it (or that type)? I figure I'll never use it for SDNEXT anyway, even if enabled in BIOS. That way, I'd be able to keep the iGPU enabled for when I don't need the 7900XTX for display purposes.

@Disty0
Copy link
Collaborator

Disty0 commented Oct 24, 2024

Try exporting these environment variables:

export HIP_VISIBLE_DEVICES=0
export ROCR_VISIBLE_DEVICES=0

Tho flash attention should work fine with the iGPU enabled after the compilation step is done without the iGPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform Platform specific problem
Projects
None yet
Development

No branches or pull requests

3 participants