Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 Windows Support #31

Open
hanseuljun opened this issue Feb 5, 2021 · 9 comments
Open

ARM64 Windows Support #31

hanseuljun opened this issue Feb 5, 2021 · 9 comments

Comments

@hanseuljun
Copy link
Contributor

hanseuljun commented Feb 5, 2021

At gf256.h line 60-66, immintrin.h gets included if the build is based on MSVC and is AVX2 is supported. The issue here starts from immintrin.h not supporting build targets other than x86 or x64.

image

It is very understandable assuming Windows is for x86 and x64 machines, but unfortunately, there is already an ARM64 machine: Hololens 2... The fix I can think of using GF256_TARGET_MOBILE as a guard to not including immintrin.h, so including immintrin.h can be avoided by manually setting GF256_TARGET_MOBILE.

I will soon submit a pull request based on the above fix and let me know if you think there is a better solution.

@hanseuljun hanseuljun changed the title A ARM64 on Windows Support Feb 5, 2021
@hanseuljun hanseuljun changed the title ARM64 on Windows Support ARM64 Windows Support Feb 6, 2021
@catid
Copy link
Owner

catid commented Feb 12, 2021

Nice fix thank you for contributing!

@igorauad
Copy link
Contributor

Hi @hanseuljun , I'm currently looking at the intrinsics headers included at gf256.h. Just trying to understand, why is <immintrin.h> included if the build is based on MSVC? Shouldn't the condition be something like the one below:

#if defined(__AVX2__) && (!defined (_MSC_VER) || _MSC_VER >= 1900)

That is, if AVX2 is available with a compiler other than MSVC, include immintrin.h. Also, if the compiler is MSVC and the version is at least 1900, include immintrin.h.

If I understand correctly, the compiler would no set __AVX2__ for an ARM build target anyway, so we wouldn't need to check GF256_TARGET_MOBILE, as done in #32 . It seems that the || (defined (_MSC_VER) && _MSC_VER >= 1900) part on the condition was the real problem.

However, I'm only starting to look at all of this. So please let me know if I'm missing something.

@hanseuljun
Copy link
Contributor Author

@igorauad
What I found different in my build environment (MSVC), AVX2 was set even when the build target is ARM. I know this is weird given AVX2 is only for x86 CPUs but I'm just reporting what I have seen.

Let me know if this is not the typical behavior of MSVC or something different from your experience.

@igorauad
Copy link
Contributor

@igorauad
What I found different in my build environment (MSVC), AVX2 was set even when the build target is ARM. I know this is weird given AVX2 is only for x86 CPUs but I'm just reporting what I have seen.

Let me know if this is not the typical behavior of MSVC or something different from your experience.

Thanks, @hanseuljun That's interesting. I was only speculating. However, I've been testing with gcc only, not MSVC.

@hanseuljun
Copy link
Contributor Author

@igorauad

Sorry I was taking time editing the above comment after actually looking at the code again. You are right! Sorry about spreading wrong information and causing confusion. Seems like the issue was assuming that all MSVC builds will target the x86 architecture in gf256.h, not MSVC incorrectly setting AVX2. Thanks for the correction!

@igorauad
Copy link
Contributor

igorauad commented Feb 16, 2021

Nice, @hanseuljun Thanks for confirming the problem.

Btw, I'm experimenting with this modified version of gf256.h here: igorauad/gf256.h. What I'm trying to do is have LINUX_ARM and GF256_TARGET_MOBILE set automatically, instead of based on flags provided by the user during compilation. In the same direction, I changed the SSE3 stuff so that it is included automatically based on the __SSE3__ macro (just like how the AVX2 header is included if __AVX2__ is set). Lastly, it seems to me that SSE2 is strictly required when the build target is not a "mobile platform" (e.g., ARM), so I added a preprocessor verification for SSE2 in gf256.h. However, this is all subject to further investigation, as I need to understand the codebase better.

I'm assuming SSE2 is strictly required because many SSE2 functions are called in gf256.cpp under the condition !defined(GF256_TARGET_MOBILE). In contrast, the SSE3 and AVX2 functions are called specifically if the CpuHasSSSE3 and CpuHasAVX2 flags are set in runtime. So my understanding is that, regardless of how gf256 was compiled, if in runtime SSE3 and AVX2 are not found, gf256 will still work (although slower) as long as SSE2 is available.

@catid could you confirm this is the right direction? Also, how could I verify everything is working correctly? Anything specific to look at with the unit test application?

igorauad added a commit to igorauad/wirehair that referenced this issue Apr 15, 2021
- Define the LINUX_ARM macro automatically based on ARM flags set by the
  compiler.
- Detect ARM Neon based on compiler flags.
- Fix the condition for inclusion of AVX2, as discussed in Issue catid#31.
- Reorganize some macro definitions for better readability. For example,
  use an isolated if-elif-else conditional to define GF256_M128.
- Throw error if SSE2 is not available when building for a non-mobile
  target.
- Remove the unused GF256_ALIGNED_ACCESSES macro.
igorauad added a commit to igorauad/wirehair that referenced this issue Apr 15, 2021
- Define the LINUX_ARM macro automatically based on ARM flags set by the
  compiler.
- Detect ARM Neon based on compiler flags. Take both __ARM_NEON and
  __ARM_NEON__ into account.
- Fix the condition for inclusion of AVX2, as discussed in Issue catid#31.
- Reorganize some macro definitions for better readability. For example,
  use an isolated if-elif-else conditional to define GF256_M128.
- Throw error if SSE2 is not available when building for a non-mobile
  target.
- Remove the unused GF256_ALIGNED_ACCESSES macro.
igorauad added a commit to igorauad/wirehair that referenced this issue Apr 16, 2021
- Define the LINUX_ARM macro automatically based on ARM flags set by the
  compiler.
- Detect ARM Neon based on compiler flags. Take both __ARM_NEON and
  __ARM_NEON__ into account.
- Fix the condition for inclusion of AVX2, as discussed in Issue catid#31.
- Reorganize some macro definitions for better readability. For example,
  use an isolated if-elif-else conditional to define GF256_M128.
- Remove the unused GF256_ALIGNED_ACCESSES macro.
igorauad added a commit to igorauad/wirehair that referenced this issue Apr 16, 2021
- Define the LINUX_ARM macro automatically based on ARM flags set by the
  compiler.
- Detect ARM Neon based on compiler flags. Take both __ARM_NEON and
  __ARM_NEON__ into account.
- Fix the condition for inclusion of AVX2, as discussed in Issue catid#31.
- Reorganize some macro definitions for better readability. For example,
  use an isolated if-elif-else conditional to define GF256_M128.
- Remove the unused GF256_ALIGNED_ACCESSES macro.
@catid
Copy link
Owner

catid commented Apr 19, 2021

Yeah auto-detect should be how it works for sure. Totally agreed. I think for this sort of thing, if it builds it should work because you can trust the intrinsics on each platform. There is a simple sanity check during the initialization so it should catch any obvious problems in this stuff.

@yll139
Copy link

yll139 commented Jun 27, 2022

Hi, when I run Benchmark() on the Ubuntu OS platform, I find that the performance is much lower than that of the windows OS platform (one order of magnitude smaller). Could you tell me why this happened? Are there some optimization options that need to be manually turned on?

@catid
Copy link
Owner

catid commented Aug 17, 2022

Hi, when I run Benchmark() on the Ubuntu OS platform, I find that the performance is much lower than that of the windows OS platform (one order of magnitude smaller). Could you tell me why this happened? Are there some optimization options that need to be manually turned on?

Perhaps cmake -DCMAKE_BUILD_TYPE=Release .. will fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants