-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64 Windows Support #31
Comments
Nice fix thank you for contributing! |
Hi @hanseuljun , I'm currently looking at the intrinsics headers included at
That is, if AVX2 is available with a compiler other than MSVC, include If I understand correctly, the compiler would no set However, I'm only starting to look at all of this. So please let me know if I'm missing something. |
@igorauad Let me know if this is not the typical behavior of MSVC or something different from your experience. |
Thanks, @hanseuljun That's interesting. I was only speculating. However, I've been testing with gcc only, not MSVC. |
Sorry I was taking time editing the above comment after actually looking at the code again. You are right! Sorry about spreading wrong information and causing confusion. Seems like the issue was assuming that all MSVC builds will target the x86 architecture in gf256.h, not MSVC incorrectly setting AVX2. Thanks for the correction! |
Nice, @hanseuljun Thanks for confirming the problem. Btw, I'm experimenting with this modified version of I'm assuming SSE2 is strictly required because many SSE2 functions are called in @catid could you confirm this is the right direction? Also, how could I verify everything is working correctly? Anything specific to look at with the unit test application? |
- Define the LINUX_ARM macro automatically based on ARM flags set by the compiler. - Detect ARM Neon based on compiler flags. - Fix the condition for inclusion of AVX2, as discussed in Issue catid#31. - Reorganize some macro definitions for better readability. For example, use an isolated if-elif-else conditional to define GF256_M128. - Throw error if SSE2 is not available when building for a non-mobile target. - Remove the unused GF256_ALIGNED_ACCESSES macro.
- Define the LINUX_ARM macro automatically based on ARM flags set by the compiler. - Detect ARM Neon based on compiler flags. Take both __ARM_NEON and __ARM_NEON__ into account. - Fix the condition for inclusion of AVX2, as discussed in Issue catid#31. - Reorganize some macro definitions for better readability. For example, use an isolated if-elif-else conditional to define GF256_M128. - Throw error if SSE2 is not available when building for a non-mobile target. - Remove the unused GF256_ALIGNED_ACCESSES macro.
- Define the LINUX_ARM macro automatically based on ARM flags set by the compiler. - Detect ARM Neon based on compiler flags. Take both __ARM_NEON and __ARM_NEON__ into account. - Fix the condition for inclusion of AVX2, as discussed in Issue catid#31. - Reorganize some macro definitions for better readability. For example, use an isolated if-elif-else conditional to define GF256_M128. - Remove the unused GF256_ALIGNED_ACCESSES macro.
- Define the LINUX_ARM macro automatically based on ARM flags set by the compiler. - Detect ARM Neon based on compiler flags. Take both __ARM_NEON and __ARM_NEON__ into account. - Fix the condition for inclusion of AVX2, as discussed in Issue catid#31. - Reorganize some macro definitions for better readability. For example, use an isolated if-elif-else conditional to define GF256_M128. - Remove the unused GF256_ALIGNED_ACCESSES macro.
Yeah auto-detect should be how it works for sure. Totally agreed. I think for this sort of thing, if it builds it should work because you can trust the intrinsics on each platform. There is a simple sanity check during the initialization so it should catch any obvious problems in this stuff. |
Hi, when I run Benchmark() on the Ubuntu OS platform, I find that the performance is much lower than that of the windows OS platform (one order of magnitude smaller). Could you tell me why this happened? Are there some optimization options that need to be manually turned on? |
Perhaps |
At gf256.h line 60-66, immintrin.h gets included if the build is based on MSVC and is AVX2 is supported. The issue here starts from immintrin.h not supporting build targets other than x86 or x64.
It is very understandable assuming Windows is for x86 and x64 machines, but unfortunately, there is already an ARM64 machine: Hololens 2... The fix I can think of using GF256_TARGET_MOBILE as a guard to not including immintrin.h, so including immintrin.h can be avoided by manually setting GF256_TARGET_MOBILE.
I will soon submit a pull request based on the above fix and let me know if you think there is a better solution.
The text was updated successfully, but these errors were encountered: