Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] crosscompile failed using custom cc_toolchain with platform_cpu. #261

Closed
ZhenshengLee opened this issue Aug 7, 2024 · 6 comments
Closed

Comments

@ZhenshengLee
Copy link

brief

NOTE: in the default platform, which is x86_64(k8) toolchain , the compile works.
I wonder if it's a bug or just a misconfiguration in the toolchain_config?

environment

bazel: version7.0.2
cctoolchain: //bazel/toolchains/v5l (a custom cc toolchain for cross compile in aarch64, like https://github.com/f0rmiga/gcc-toolchain/blob/main/toolchain/cc_toolchain_config.bzl)

├── toolchains
│   └── v5l
│       ├── BUILD
│       ├── v5l.BUILD
│       └── v5l_cc_toolchain_config.bzl

repro steps

simply compile the basic example with cu_library and report the following errors.
NOTE: in the default platform, which is x86_64(k8) toolchain , the compile works.

(09:38:08) ERROR: /gw_demo/modules/team_demo/module_demo/BUILD:46:13: Compiling modules/team_demo/module_demo/src/module_demo/module_cuda.cu failed: (Exit 1): nvcc failed: error executing CudaCompile command (from target //modules/team_demo/module_demo:module_cuda) 
  (cd /home/zs/.cache/bazel/_bazel_zs/2c098eac6c684e1fabebb74f5f4483bd/execroot/gaos && \
  exec env - \
    PATH=bazel/toolchains/v5l \
  /usr/local/cuda/bin/nvcc -x cu -gencode 'arch=compute_86,code=compute_86' -gencode 'arch=compute_86,code=sm_86' -Xcompiler -fPIC -ccbin bazel/toolchains/v5l/gcc -I . -I bazel-out/aarch64-dbg/bin -I external/local_cuda -I bazel-out/aarch64-dbg/bin/external/local_cuda -isystem modules/team_demo/module_demo/include -isystem bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/include -isystem external/local_cuda/cuda/include -isystem bazel-out/aarch64-dbg/bin/external/local_cuda/cuda/include -Xcompiler '--cbin /hhh' -O0 -g -c modules/team_demo/module_demo/src/module_demo/module_cuda.cu -o bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/_objs/module_cuda/module_cuda.pic.o --expt-relaxed-constexpr --extended-lambda)
# Configuration: 8d031e595766517cd77a614587e21bb1501021be933a2e40e13d75f49ba5eaf0
# Execution platform: @@local_config_platform//:host
bazel/toolchains/v5l/gcc: No such file or directory
nvcc fatal   : Failed to preprocess host compiler properties.
(09:38:08) INFO: Elapsed time: 0.480s, Critical Path: 0.03s
(09:38:08) INFO: 5 processes: 5 internal.
(09:38:08) ERROR: Build did NOT complete successfully

considerations

-ccbin parameter is in

host_compiler_feature = feature(
name = "host_compiler_path",
enabled = True,
flag_sets = [
flag_set(
actions = [
ACTION_NAMES.cuda_compile,
ACTION_NAMES.device_link,
],
flag_groups = [flag_group(flags = ["-ccbin", "%{host_compiler}"])],
),
],
)

which is created from host_compiler, in

cc_toolchain: A `CcToolchainInfo`. Can be obtained with `find_cpp_toolchain(ctx)`.
srcs: A list of `File`s to be compiled.
common: A cuda common object. Can be obtained with `cuda_helper.create_common(ctx)`
pic: Whether the `srcs` are compiled for position independent code.
rdc: Whether the `srcs` are compiled for relocatable device code.
_prefix: DON'T USE IT! Prefix of the output dir. Exposed for device link to redirect the output.
Returns:
An compiled object `File`.
"""
actions = ctx.actions
host_compiler = cc_toolchain.compiler_executable
cuda_compiler = cuda_toolchain.compiler_executable

originaly created from

def _cuda_library_impl(ctx):
"""cuda_library is a rule that perform device link.
cuda_library produce self-contained object file. It produces object files
or static library that is consumable by cc_* rules"""
attr = ctx.attr
cuda_helper.check_srcs_extensions(ctx, ALLOW_CUDA_SRCS + ALLOW_CUDA_HDRS, "cuda_library")
cc_toolchain = find_cpp_toolchain(ctx)

related info

there is an issue bazelbuild/bazel#22561 , which describes the *_executable variable empty, but I think this issue is not related here, because the variable compiler_executable is wrong with bazel/toolchains/v5l, the gcc toolchain and some other tools are linked under the wrong directory. actually the gcc is in /drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/aarch64-linux-gcc

bazelbuild/bazel#7105 this issue describes the cc_toolchain.compiler_executable() usage.

https://bazel.build/rules/lib/providers/CcToolchainInfo#compiler_executable and https://bazel.build/configure/integrate-cpp are the bazel official doc about this topic.

the official example about this variable is here https://github.com/bazelbuild/rules_cc/blob/main/examples/write_cc_toolchain_cpu/write_cc_toolchain_cpu.bzl

I wonder if it's a bug or just a misconfiguration in the toolchain_config?

workaround(failed)

I tried to set some soft links to the toolchain crosstool, but failing with more compiling errors.

ln -s /drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/aarch64-linux-gcc /gw_demo/bazel/toolchains/v5l/gcc
ln -s /drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/aarch64-linux-gcc.br_real /gw_demo/bazel/toolchains/bin/gcc.br_real
@cloudhan
Copy link
Collaborator

cloudhan commented Aug 7, 2024

Seems to be a hermatic issue on first glance (gcc executable is not correctly brought into the building sandbox). The ccbin is just taken from the cc toolchain in use and then passthrough, since cc_toolchain is an implicit dep, then all related files claimed by the cc toolchain config should be brought into the sandbox automatically. You might need to inspect into the bazel-<your_repo>/external/local_config_cc for the cc toolchain configuration.

Before dive deep, it is worth to first try to use --spawn_strategy=local with bazel build to see how it goes.

@ZhenshengLee
Copy link
Author

ZhenshengLee commented Aug 8, 2024

Thanks for your quick reply!

Before dive deep, it is worth to first try to use --spawn_strategy=local with bazel build to see how it goes.

About bazelrc, I already have it configured, the content of bazelrc is in following

# +------------------------------------------------------------+
# | Startup Options                                            |
# +------------------------------------------------------------+
startup --batch_cpu_scheduling
startup --host_jvm_args=-XX:-UseParallelGC
# offline options
common --repository_cache="./.cache"
# Don't use bzlmod yet.
common --enable_bzlmod=false
# common --registry="path/to/local/bcr/registry"
# common --enable_platform_specific_config
# --platform_suffix=platsuffix
common --default_visibility=public

# +------------------------------------------------------------+
# | Build Configurations                                       |
# +------------------------------------------------------------+
# dist folder
build --distdir="./bazel/dist"
build --python_path=/usr/bin/python3
build --skip_incompatible_explicit_targets
# Enable colorful output of GCC
build --cxxopt="-fdiagnostics-color=always"
build --cxxopt='-std=c++17'
build --cxxopt='-D_GLIBCXX_USE_CXX11_ABI=1'
build --output_filter="^//"
build --show_timestamps
build --force_pic
# Work around the sandbox issue.
build --spawn_strategy=local
# Specify protobuf cc toolchain
build --proto_toolchain_for_cc="@com_google_protobuf//:cc_toolchain"

# cuda related configs
# Convenient flag shortcuts.
build --flag_alias=enable_cuda=@rules_cuda//cuda:enable
build --flag_alias=cuda_archs=@rules_cuda//cuda:archs
build --flag_alias=cuda_compiler=@rules_cuda//cuda:compiler
build --flag_alias=cuda_copts=@rules_cuda//cuda:copts
build --flag_alias=cuda_runtime=@rules_cuda//cuda:runtime
build --flag_alias=cuda_host_copts=@rules_cuda//cuda:host_copts

build --enable_cuda=True
build --cuda_archs=compute_86:compute_86,sm_86

# special configs
build:ubuntu_host_linux --python_path=/usr/bin/python3
build:drive_sdk_6081_linux --python_path=/usr/bin/python3
# build with profiling
build:cpu_prof --linkopt=-lprofiler

# build debug
# build --sandbox_debug
build --verbose_failures
# build --explain=./bazel.log

The ccbin is just taken from the cc toolchain in use and then passthrough, since cc_toolchain is an implicit dep, then all related files claimed by the cc toolchain config should be brought into the sandbox automatically.

yes, and It's not clear what behavior/variable from cc toolchain config info changed the behavior/variable in cc toolchain info, the content of my toolchain config is in folllowing

    return cc_common.create_cc_toolchain_config_info(
        ctx = ctx,
        toolchain_identifier = ctx.attr.toolchain_identifier,
        host_system_name = ctx.attr.host_system_name,
        target_system_name = "aarch64-buildroot-linux-gnu",
        target_cpu = "aarch64-buildroot-linux-gnu",
        target_libc = "gcc",
        compiler = ctx.attr.gcc_repo,
        cxx_builtin_include_directories = cxx_builtin_include_directories,
        # builtin_sysroot = DRIVE_SDK_V5L_SYS_ROOT,
        abi_version = "gcc-9.3",
        abi_libc_version = ctx.attr.gcc_version,
        cc_target_os = "linux",
        action_configs = action_configs,
        features = [
            toolchain_compiler_flags,
            toolchain_linker_flags,
            custom_linkopts,
            dbg_feature,
            opt_feature,
        ],
    )

You might need to inspect into the bazel-<your_repo>/external/local_config_cc for the cc toolchain configuration.

I will get into it soon .

EDIT: After watching the files in bazel-<your_repo>/external/local_config_cc, I believe the variable compiler_executable is derived from tool_paths

in bazel-gw_demo/external/local_config_cc/BUILD

tool_paths = {"ar": "/usr/bin/ar",
        "ld": "/usr/bin/ld",
        "llvm-cov": "None",
        "llvm-profdata": "None",
        "cpp": "/usr/bin/cpp",
        "gcc": "/usr/bin/gcc",
        "dwp": "/usr/bin/dwp",
        "gcov": "/usr/bin/gcov",
        "nm": "/usr/bin/nm",
        "objcopy": "/usr/bin/objcopy",
        "objdump": "/usr/bin/objdump",
        "strip": "/usr/bin/strip"},

but in default-cross-toolchain(auto-generated), bazel-gw_demo/external/local_config_cc/armeabi_cc_toolchain_config.bzl
the tool_paths variable is invalid

tool_paths = [
        tool_path(name = "ar", path = "/bin/false"),
        tool_path(name = "cpp", path = "/bin/false"),
        tool_path(name = "dwp", path = "/bin/false"),
        tool_path(name = "gcc", path = "/bin/false"),
        tool_path(name = "gcov", path = "/bin/false"),
        tool_path(name = "ld", path = "/bin/false"),
        tool_path(name = "llvm-profdata", path = "/bin/false"),
        tool_path(name = "nm", path = "/bin/false"),
        tool_path(name = "objcopy", path = "/bin/false"),
        tool_path(name = "objdump", path = "/bin/false"),
        tool_path(name = "strip", path = "/bin/false"),
    ]

@ZhenshengLee
Copy link
Author

BTW, is there any plan of the project give an example to use custom/crosscompile cc_toolchain?

@ZhenshengLee
Copy link
Author

After add tool_paths in cc_toolchain_config_info, the variable gets available.

tool_paths = [
        tool_path(
            name = "ld",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-ld.gold",
        ),
        tool_path(
            name = "cpp",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-cpp",
        ),
        tool_path(
            name = "dwp",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-dwp",
        ),
        tool_path(
            name = "gcov",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-gcov",
        ),
        tool_path(
            name = "nm",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-nm",
        ),
        tool_path(
            name = "objcopy",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-objcopy",
        ),
        tool_path(
            name = "objdump",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-objdump",
        ),
        tool_path(
            name = "strip",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-strip",
        ),
        tool_path(
            name = "gcc",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-gcc",
        ),
        tool_path(
            name = "ar",
            path = DRIVE_SDK_V5L_TOOLCHAIN_ROOT + "/bin/aarch64-buildroot-linux-gnu-ar",
        ),
    ]

@ZhenshengLee ZhenshengLee changed the title [BUG] compiling failed with nvcc+gcc using custom cc_toolchain with platform, with default cc_toolchain success. [BUG] crosscompile failed using custom cc_toolchain with platform_cpu. Aug 8, 2024
@hypdeb
Copy link
Contributor

hypdeb commented Sep 8, 2024

If it can help anyone finding this issue in the future: facing a similar issue using https://github.com/bazelbuild/rules_cc to declare my cc_toolchain. It seems that it does not correctly set whatever is returned by cc_toolchain.compiler_executable.

@hypdeb
Copy link
Contributor

hypdeb commented Sep 8, 2024

Digging a bit deeper, found that cc_toolchain.compiler_executable is on its way out. This means that the issue should be fixed here by using get_tool_for_action instead. I'll try to create a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants