NEON intrinsics are broken on big-endian #1484

Amanieu · 2023-10-19T15:24:21Z

These are currently broken because the order of elements inside vectors is reversed on big-endian systems: the ARM ABI requires that element 0 is located at the highest address of the vector type. However LLVM intrinsics expect element 0 to be located at the lowest address.

See https://llvm.org/docs/BigEndianNEON.html and arm_neon.h in Clang for more details.

The text was updated successfully, but these errors were encountered:

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

RalfJung · 2024-02-13T06:49:34Z

the ARM ABI requires that element 0 is located at the highest address of the vector type. However LLVM intrinsics expect element 0 to be located at the lowest address.

What exactly does this mean? Is there a bug in LLVM? If so, where is it tracked?

Or is the problem that Rust stdarch wants to expose the intrinsics the way they work on hardware, but LLVM doesn't provide those semantics? If so, could that be fixed by doing appropriate translation of indices before calling the intrinsics?

Amanieu · 2024-02-14T01:09:28Z

The short answer is that, on big-endian, LLVM portable vectors have a different element ordering than the one in the vector types used by the NEON intrinsics.

The C intrinsics work around this by reversing the element ordering in vectors before & after each intrinsic. We need to do the same in stdarch.

RalfJung · 2024-02-14T06:41:06Z

Oh I see, so this is a mismatch about the simd_x intrinsics vs vendor-specific intrinsics? Okay makes sense.

OTOH this is good news for portable-simd, seems like there we'll be getting consistent behavior across platforms without extra work then.

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

As noted in rust-lang/stdarch#1484, the NEON intrinsics are broken on big-endian aarch64. This is part of fixing rust to build for & on big-endian aarch64, following up rust-lang/rust#129819.

Neon / SIMD is known to be problematical in rust, ref. rust-lang/stdarch#1484, even though the CPU itself supports it.

This is done by avoiding attempts at using neon / SIMD in big-endian mode by patching some of the vendored crates. Neon / SIMD is known to be problematical in rust, ref. rust-lang/stdarch#1484, even though the CPU itself supports it. I've also tried reporting the memchr fixes upstream, ref. BurntSushi/memchr#162 So far not yet adopted. Zerocopy has also received a pull request: google/zerocopy#1795

Do this by avoiding trying to use neon / SIMD on big-endian aarch64. Neon intrinsics are problematical on big-endian targets, ref. rust-lang/stdarch#1484

workingjubilee · 2024-10-07T20:17:25Z

@he32

I currently appear unable to find the actual connecting tissue between library/stdarch/crates/core_arch/src/aarch64/neon/mod.rs and LLVM

The actual place that intrinsics themselves are handled is in two places: if it's an architecture-specific intrinsic, it uses link_llvm_intrinsics, which effectively specifies a lowering directly to LLVM textual IR. Otherwise, if it's one of rustc's "portable" intrinsics (simd_add and the like), the primary definition is in rustc_codegen_llvm: https://github.com/rust-lang/rust/blob/master/compiler/rustc_codegen_llvm/src/intrinsic.rs

he32 · 2024-11-10T06:53:19Z

As stated elsewhere, I reported the issue rust-lang/rust#129819 earlier, and now have workarounds in place so that I'm able to produce a working rust compiler on big-endian NetBSD/aarch64 by avoiding attempts to use the NEON extensions in that mode.

However, since those extensions are available in the CPU, a better solution would be to fix this issue and then probably to revert the workarounds.

Since I'm a relative rust newbie, I have been thinking about what it would take to get some forward motion on that underlying issue. I have so far come to the conclusion that it would be helpful to have a test program (in rust, of course), which excercises / validates all the NEON SIMD extensions, runnable on 64-bit little-endian aarch64 system. Perhaps such a program already exists, and it's just a matter of pointing to it? My newbie status would make it difficult for me to come up with such a program, and I am hoping that it would be helpful in exploring a fix to this underlying issue. Since I do this in my copious spare time (as the expression goes), I can't make any firm commitments, but I think this will be a useful starting point for anyone wanting to tackle this issue properly.

The second worry I have is whether adding swizzling / byte-swapping of arguments and results before/after using NEON intrinsics will tend to negate the gains otherwise achieved by the NEON extensions compared to little-endian mode. I don't have a good intuition for that -- anyone have a better suggestion for that? Ideally, the test program could also do some measurement / validation of that? (Or would that be asking too much? It would not be required to act as the initial stepping stone, at least.)

RalfJung · 2024-11-10T09:38:41Z

The second worry I have is whether adding swizzling / byte-swapping of arguments and results before/after using NEON intrinsics will tend to negate the gains otherwise achieved by the NEON extensions compared to little-endian mode.

I assume LLVM has to insert its own byte swapping when lowering the portable SIMD operations on big-endian NEON... so hopefully the codegen backend is good enough to realize that the two swaps cancel each other out, and remove both of them? If not, that seems worth reporting as an LLVM bug.

have a test program (in rust, of course), which excercises / validates all the NEON SIMD extensions, runnable on 64-bit little-endian aarch64 system

The stdarch test suite should be able to serve that purpose. The tricky part probably is that we don't have a way to run it on CI. Miri can run some of it (when it only needs generic SIMD intrinsics), not sure if that is good enough to gain confidence for re-landing the intrinsics.

RalfJung · 2024-11-10T09:39:40Z

I also found this gem in the LLVM docs mentioned above:

Make sure appropriate bitconverts are created so that vector values get passed over call boundaries as 1-element vectors (which is the same as if they were loaded with LDR).

Is that something the frontend has to do? That would mean we need a special case in our ABI handling code to use PassMode::Cast with a 1-element vector for all by-val vector passing on these targets. @workingjubilee is going to love this. ;)

workingjubilee · 2024-11-10T10:51:34Z

...whaaa? so a <16 x i8> becomes <1 x i128>?

RalfJung · 2024-11-10T11:02:14Z

That's how I understand this, yes.

he32 · 2024-11-10T13:21:49Z

have a test program (in rust, of course), which excercises / validates all the NEON SIMD extensions, runnable on 64-bit little-endian aarch64 system

The stdarch test suite should be able to serve that purpose.

Hmm, then I need to go look there (pointer to directory?), and see if the NEON stuff is easily identifiable / isolateable.

The tricky part probably is that we don't have a way to run it on CI. Miri can run some of it (when it only needs generic SIMD intrinsics), not sure if that is good enough to gain confidence for re-landing the intrinsics.

I am "old school", so was thinking foremost of doing the development without any CI support. Getting CI in place for this would then be a separate issue to be tackled separately.

RalfJung · 2024-11-10T13:25:03Z

We shouldn't land anything without CI support, but ofc you can develop in whatever order suits you best. :)

he32 · 2024-11-10T13:30:13Z

We shouldn't land anything without CI support, but ofc you can develop in whatever order suits you best. :)

I completely understand, and probably agree, and recall having seen hints which might help in that direction. We'll see. First things first.

Amanieu · 2024-11-10T13:37:25Z

The easiest way to support big-endian would be to migrate all NEON intrinsics to use the stdarch-gen code generation framework and then have that automatically insert the needed swizzles on big-endian. However this is a huge amount of work and not a trivial undertaking.

Another approach would be to adapt the new code generator used for SVE intrinsics in #1509 to also generate NEON intriniscs, which may be easier to work with than the current code generator. cc @JamieCunliffe

he32 · 2024-11-10T14:34:03Z

The second worry I have is whether adding swizzling / byte-swapping of arguments and results before/after using NEON intrinsics will tend to negate the gains otherwise achieved by the NEON extensions compared to little-endian mode.

I assume LLVM has to insert its own byte swapping when lowering the portable SIMD operations on big-endian NEON... so hopefully the codegen backend is good enough to realize that the two swaps cancel each other out, and remove both of them? If not, that seems worth reporting as an LLVM bug.

Then I do not understand. As I understand it, doing swaps both pre- and post-SIMD operations are a necessary part of making the SIMD operations work as intended. They are therefore not "cancellable".

have a test program (in rust, of course), which excercises / validates all the NEON SIMD extensions, runnable on 64-bit little-endian aarch64 system

The stdarch test suite should be able to serve that purpose.

Sadly, this fails the "here is a concrete set of operations to test & validate" test. It's like saying to me "it is in there, somewhere, in the rust compiler sources -- you go figure out where and what by yourself".

RalfJung · 2024-11-10T15:24:46Z

Yeah the test suite is complicated I am afraid -- I was just giving you some pointers that hopefully lead into the right direction. That's all I can offer, sorry. I don't know how the stdarch test suite is set up. I'm afraid I don't think there is such a thing as a simple test program; stdarch offers many thousand operations and has some fancy setup to test them all.

Amanieu · 2024-11-10T17:57:04Z

You can look at the intrinsic-test crate in this repo which checks that the intrinsics match the behavior of those same intrinsics in Clang. See our CI script for an example of how to run it.

he32 · 2024-11-10T19:34:54Z

Yeah the test suite is complicated I am afraid -- I was just giving you some pointers that hopefully lead into the right direction. That's all I can offer, sorry. I don't know how the stdarch test suite is set up. I'm afraid I don't think there is such a thing as a simple test program; stdarch offers many thousand operations and has some fancy setup to test them all.

OK, I understand. I have found at least some of what needs to be looked at, and I'm trying to follow the various suggestions here. Next will need to be some experimentation etc. We'll see how that goes. Thanks anyway!

jswrenn · 2024-11-12T16:40:41Z

Did something recently happen in this space? Zerocopy's nightly CI started failing this past weekend due to SIMD intrinsic name resolution errors on aarch64_be-unknown-linux-gnu while building memchr. It doesn't look like memchr has changed; was rustc changed?

taiki-e · 2024-11-12T16:49:44Z

@jswrenn That is because rust-lang/rust#132714 bumped memchr without considering aarch64_be is not supported in the latest memchr (BurntSushi/memchr#162).

(According to the author of that rust-lang/rust PR, the memchr bump might be able to be reverted: taiki-e/atomic-maybe-uninit@6ea62cc#commitcomment-148891566)

Amanieu added bug A-arm A-aarch64 labels Oct 19, 2023

This was referenced Oct 19, 2023

Disable NEON intrinsics on big-endian ARM #1485

Merged

HashMap executes UB on aarch64_be-unknown-linux-gnu_ilp32 rust-lang/rust#116880

Closed

Update hashbrown to 0.14.2 rust-lang/rust#116956

Merged

Amanieu added a commit to Amanieu/rust that referenced this issue Jan 30, 2024

Disable conversions between portable_simd and stdarch on big-endian ARM

9de5974

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

Amanieu added a commit to Amanieu/rust that referenced this issue Jan 30, 2024

Disable conversions between portable_simd and stdarch on big-endian ARM

ab8b661

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Feb 17, 2024

Disable conversions between portable_simd and stdarch on big-endian ARM

7e65a05

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Feb 17, 2024

Disable conversions between portable_simd and stdarch on big-endian ARM

56f5476

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Feb 17, 2024

Disable conversions between portable_simd and stdarch on big-endian ARM

3f27aa8

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Feb 17, 2024

Disable conversions between portable_simd and stdarch on big-endian ARM

9aec60e

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Apr 9, 2024

Disable conversions between portable_simd and stdarch on big-endian ARM

ee3dda8

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

calebzulawski pushed a commit to rust-lang/portable-simd that referenced this issue Apr 9, 2024

Disable conversions between portable_simd and stdarch on big-endian ARM

d5b56c3

stdarch no longer provide SIMD on big-endian ARM due to rust-lang/stdarch#1484

bjorn3 mentioned this issue Aug 31, 2024

rust 1.80.0 fails to build for aarch64_be-unknown-netbsd rust-lang/rust#129819

Closed

he32 mentioned this issue Sep 29, 2024

Make this crate build on big-endian aarch64. BurntSushi/memchr#162

Open

he32 added a commit to he32/zerocopy that referenced this issue Oct 2, 2024

src/impls.rs: don't attempt use of neon / SIMD on big-endian aarch64.

493b45a

Neon / SIMD is known to be problematical in rust, ref. rust-lang/stdarch#1484, even though the CPU itself supports it.

he32 mentioned this issue Oct 2, 2024

src/impls.rs: don't attempt use of neon / SIMD on big-endian aarch64. google/zerocopy#1795

Closed

This was referenced Oct 2, 2024

aarch64 fix for big endian types google/zerocopy#1786

Merged

for big endian types use fallback starkat99/half-rs#117

Open

he32 added a commit to he32/bytecount that referenced this issue Oct 2, 2024

src/lib.rs: make this build on big-endian aarch64.

0430800

Do this by avoiding trying to use neon / SIMD on big-endian aarch64. Neon intrinsics are problematical on big-endian targets, ref. rust-lang/stdarch#1484

he32 mentioned this issue Oct 2, 2024

src/lib.rs: make this build on big-endian aarch64. llogiq/bytecount#95

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEON intrinsics are broken on big-endian #1484

NEON intrinsics are broken on big-endian #1484

Amanieu commented Oct 19, 2023

RalfJung commented Feb 13, 2024

Amanieu commented Feb 14, 2024

RalfJung commented Feb 14, 2024

workingjubilee commented Oct 7, 2024

he32 commented Nov 10, 2024

RalfJung commented Nov 10, 2024

RalfJung commented Nov 10, 2024 •

edited

Loading

workingjubilee commented Nov 10, 2024

RalfJung commented Nov 10, 2024

he32 commented Nov 10, 2024

RalfJung commented Nov 10, 2024

he32 commented Nov 10, 2024

Amanieu commented Nov 10, 2024

he32 commented Nov 10, 2024

RalfJung commented Nov 10, 2024

Amanieu commented Nov 10, 2024

he32 commented Nov 10, 2024

jswrenn commented Nov 12, 2024

taiki-e commented Nov 12, 2024

NEON intrinsics are broken on big-endian #1484

NEON intrinsics are broken on big-endian #1484

Comments

Amanieu commented Oct 19, 2023

RalfJung commented Feb 13, 2024

Amanieu commented Feb 14, 2024

RalfJung commented Feb 14, 2024

workingjubilee commented Oct 7, 2024

he32 commented Nov 10, 2024

RalfJung commented Nov 10, 2024

RalfJung commented Nov 10, 2024 • edited Loading

workingjubilee commented Nov 10, 2024

RalfJung commented Nov 10, 2024

he32 commented Nov 10, 2024

RalfJung commented Nov 10, 2024

he32 commented Nov 10, 2024

Amanieu commented Nov 10, 2024

he32 commented Nov 10, 2024

RalfJung commented Nov 10, 2024

Amanieu commented Nov 10, 2024

he32 commented Nov 10, 2024

jswrenn commented Nov 12, 2024

taiki-e commented Nov 12, 2024

RalfJung commented Nov 10, 2024 •

edited

Loading