ARM64 Windows Support #31

hanseuljun · 2021-02-05T20:01:36Z

At gf256.h line 60-66, immintrin.h gets included if the build is based on MSVC and is AVX2 is supported. The issue here starts from immintrin.h not supporting build targets other than x86 or x64.

It is very understandable assuming Windows is for x86 and x64 machines, but unfortunately, there is already an ARM64 machine: Hololens 2... The fix I can think of using GF256_TARGET_MOBILE as a guard to not including immintrin.h, so including immintrin.h can be avoided by manually setting GF256_TARGET_MOBILE.

I will soon submit a pull request based on the above fix and let me know if you think there is a better solution.

catid · 2021-02-12T04:48:21Z

Nice fix thank you for contributing!

igorauad · 2021-02-15T17:53:11Z

Hi @hanseuljun , I'm currently looking at the intrinsics headers included at gf256.h. Just trying to understand, why is <immintrin.h> included if the build is based on MSVC? Shouldn't the condition be something like the one below:

#if defined(__AVX2__) && (!defined (_MSC_VER) || _MSC_VER >= 1900)

That is, if AVX2 is available with a compiler other than MSVC, include immintrin.h. Also, if the compiler is MSVC and the version is at least 1900, include immintrin.h.

If I understand correctly, the compiler would no set __AVX2__ for an ARM build target anyway, so we wouldn't need to check GF256_TARGET_MOBILE, as done in #32 . It seems that the || (defined (_MSC_VER) && _MSC_VER >= 1900) part on the condition was the real problem.

However, I'm only starting to look at all of this. So please let me know if I'm missing something.

hanseuljun · 2021-02-15T22:02:23Z

@igorauad
What I found different in my build environment (MSVC), AVX2 was set even when the build target is ARM. I know this is weird given AVX2 is only for x86 CPUs but I'm just reporting what I have seen.

Let me know if this is not the typical behavior of MSVC or something different from your experience.

igorauad · 2021-02-15T22:22:01Z

@igorauad
What I found different in my build environment (MSVC), AVX2 was set even when the build target is ARM. I know this is weird given AVX2 is only for x86 CPUs but I'm just reporting what I have seen.

Let me know if this is not the typical behavior of MSVC or something different from your experience.

Thanks, @hanseuljun That's interesting. I was only speculating. However, I've been testing with gcc only, not MSVC.

hanseuljun · 2021-02-15T22:22:58Z

@igorauad

Sorry I was taking time editing the above comment after actually looking at the code again. You are right! Sorry about spreading wrong information and causing confusion. Seems like the issue was assuming that all MSVC builds will target the x86 architecture in gf256.h, not MSVC incorrectly setting AVX2. Thanks for the correction!

igorauad · 2021-02-16T23:42:52Z

Nice, @hanseuljun Thanks for confirming the problem.

Btw, I'm experimenting with this modified version of gf256.h here: igorauad/gf256.h. What I'm trying to do is have LINUX_ARM and GF256_TARGET_MOBILE set automatically, instead of based on flags provided by the user during compilation. In the same direction, I changed the SSE3 stuff so that it is included automatically based on the __SSE3__ macro (just like how the AVX2 header is included if __AVX2__ is set). Lastly, it seems to me that SSE2 is strictly required when the build target is not a "mobile platform" (e.g., ARM), so I added a preprocessor verification for SSE2 in gf256.h. However, this is all subject to further investigation, as I need to understand the codebase better.

I'm assuming SSE2 is strictly required because many SSE2 functions are called in gf256.cpp under the condition !defined(GF256_TARGET_MOBILE). In contrast, the SSE3 and AVX2 functions are called specifically if the CpuHasSSSE3 and CpuHasAVX2 flags are set in runtime. So my understanding is that, regardless of how gf256 was compiled, if in runtime SSE3 and AVX2 are not found, gf256 will still work (although slower) as long as SSE2 is available.

@catid could you confirm this is the right direction? Also, how could I verify everything is working correctly? Anything specific to look at with the unit test application?

- Define the LINUX_ARM macro automatically based on ARM flags set by the compiler. - Detect ARM Neon based on compiler flags. - Fix the condition for inclusion of AVX2, as discussed in Issue catid#31. - Reorganize some macro definitions for better readability. For example, use an isolated if-elif-else conditional to define GF256_M128. - Throw error if SSE2 is not available when building for a non-mobile target. - Remove the unused GF256_ALIGNED_ACCESSES macro.

- Define the LINUX_ARM macro automatically based on ARM flags set by the compiler. - Detect ARM Neon based on compiler flags. Take both __ARM_NEON and __ARM_NEON__ into account. - Fix the condition for inclusion of AVX2, as discussed in Issue catid#31. - Reorganize some macro definitions for better readability. For example, use an isolated if-elif-else conditional to define GF256_M128. - Throw error if SSE2 is not available when building for a non-mobile target. - Remove the unused GF256_ALIGNED_ACCESSES macro.

- Define the LINUX_ARM macro automatically based on ARM flags set by the compiler. - Detect ARM Neon based on compiler flags. Take both __ARM_NEON and __ARM_NEON__ into account. - Fix the condition for inclusion of AVX2, as discussed in Issue catid#31. - Reorganize some macro definitions for better readability. For example, use an isolated if-elif-else conditional to define GF256_M128. - Remove the unused GF256_ALIGNED_ACCESSES macro.

catid · 2021-04-19T18:49:57Z

Yeah auto-detect should be how it works for sure. Totally agreed. I think for this sort of thing, if it builds it should work because you can trust the intrinsics on each platform. There is a simple sanity check during the initialization so it should catch any obvious problems in this stuff.

yll139 · 2022-06-27T15:49:20Z

Hi, when I run Benchmark() on the Ubuntu OS platform, I find that the performance is much lower than that of the windows OS platform (one order of magnitude smaller). Could you tell me why this happened? Are there some optimization options that need to be manually turned on?

catid · 2022-08-17T23:26:50Z

Hi, when I run Benchmark() on the Ubuntu OS platform, I find that the performance is much lower than that of the windows OS platform (one order of magnitude smaller). Could you tell me why this happened? Are there some optimization options that need to be manually turned on?

Perhaps cmake -DCMAKE_BUILD_TYPE=Release .. will fix it?

hanseuljun changed the title A ARM64 on Windows Support Feb 5, 2021

hanseuljun mentioned this issue Feb 5, 2021

Avoid including immintrin.h when GF256_TARGET_MOBILE is set. #32

Merged

hanseuljun changed the title ~~ARM64 on Windows Support~~ ARM64 Windows Support Feb 6, 2021

igorauad mentioned this issue Apr 15, 2021

Review intrinsics includes and macros on gf256.h #36

Merged

xqms mentioned this issue Sep 19, 2022

Build issues on Jetson Nano AIS-Bonn/nimbro_network#20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64 Windows Support #31

ARM64 Windows Support #31

hanseuljun commented Feb 5, 2021 •

edited

Loading

catid commented Feb 12, 2021

igorauad commented Feb 15, 2021

hanseuljun commented Feb 15, 2021

igorauad commented Feb 15, 2021

hanseuljun commented Feb 15, 2021

igorauad commented Feb 16, 2021 •

edited

Loading

catid commented Apr 19, 2021 •

edited

Loading

yll139 commented Jun 27, 2022

catid commented Aug 17, 2022

ARM64 Windows Support #31

ARM64 Windows Support #31

Comments

hanseuljun commented Feb 5, 2021 • edited Loading

catid commented Feb 12, 2021

igorauad commented Feb 15, 2021

hanseuljun commented Feb 15, 2021

igorauad commented Feb 15, 2021

hanseuljun commented Feb 15, 2021

igorauad commented Feb 16, 2021 • edited Loading

catid commented Apr 19, 2021 • edited Loading

yll139 commented Jun 27, 2022

catid commented Aug 17, 2022

hanseuljun commented Feb 5, 2021 •

edited

Loading

igorauad commented Feb 16, 2021 •

edited

Loading

catid commented Apr 19, 2021 •

edited

Loading