Add Neoverse V2 and Armv9 #79

AdhocMan · 2023-10-17T07:29:59Z

Add support for Armv9 and Neoverse v2, as required for the Nvidia Grace CPU.

alalazo · 2023-10-28T11:31:28Z

I'll follow with a PR referencing this on the archspec side and check whether tests passes. @fspiga Do you (or anybody from NVIDIA) want to double check this?

fspiga · 2023-10-30T11:04:12Z

I have staged changes for the similar type of enhancements in my local system and I have been testing things for few weeks.

For GCC/GNU, re use of -mtune and -march, see Compiler flags across architectures: -march, -mtune, and -mcpu. As long as someone is not cross compiling, we advise to just use -mcpu for 12.2 onward. It is possible to compile anything with a older GCC but we strongly advise to use a modern version of the compiler. On any Grace platform, a modern software stack equals higher chances of better generated code and performance.

Re NVHPC, IIRC neoverse-v2 supports in 23.3 was like a beta. We advise to go from 23.5 where we added a first wave of bug fixes and enhahncements. I will verify this.

Re CLANG, I am not sure the -msve-vector-bits=128 is necessary. The -mcpu=neoverse-v should already imply 128bits SIMD (it is a fixed uarch property). I will verify this.

It is worth to add Arm HPC Compiler in this mix. Support for V2 starts from version 23.04.1. We can have someone from Arm Ltd double-check this.

In my working copy, anything that is below the reccomended versions for Grace goes to neoverse_n1 or armv8.4a. This is because, at least in my mind, whoever gets a Grace system will also have modern Linux Kernel (6.2+) and a recent OS and take the step to use modern compilers. Internally we are not testing for performancer or optimal code generation on older GNU on Grace. We know that, thanks to Arm ecosystem, any generic armv8 binary with NEON enabled will run.

AdhocMan · 2023-10-30T13:46:46Z

For GCC/GNU, re use of -mtune and -march, see Compiler flags across architectures: -march, -mtune, and -mcpu. As long as someone is not cross compiling, we advise to just use -mcpu for 12.2 onward. It is possible to compile anything with a older GCC but we strongly advise to use a modern version of the compiler. On any Grace platform, a modern software stack equals higher chances of better generated code and performance.

As far as I know, GCC only added support for -mpcu=neoverse-v2 in version 13. Am I mistaken or are you suggesting to use a different cpu target for 12.2?

Re NVHPC, IIRC neoverse-v2 supports in 23.3 was like a beta. We advise to go from 23.5 where we added a first wave of bug fixes and enhahncements. I will verify this.

That would be great, thanks.

Re CLANG, I am not sure the -msve-vector-bits=128 is necessary. The -mcpu=neoverse-v should already imply 128bits SIMD (it is a fixed uarch property). I will verify this.

The documentation of both GCC and CLANG state that the default is 'scalable'. I could not very if this is included in the mcpu option, so I've kept it when possible. I'll remove it if you can verify that it is redundant.

It is worth to add Arm HPC Compiler in this mix. Support for V2 starts from version 23.04.1. We can have someone from Arm Ltd double-check this.

According to the release notes, it was added to 23.04.0: https://developer.arm.com/documentation/107578/2304/?lang=en
I'll add the flags starting from that release.

In my working copy, anything that is below the reccomended versions for Grace goes to neoverse_n1 or armv8.4a. This is because, at least in my mind, whoever gets a Grace system will also have modern Linux Kernel (6.2+) and a recent OS and take the step to use modern compilers. Internally we are not testing for performancer or optimal code generation on older GNU on Grace. We know that, thanks to Arm ecosystem, any generic armv8 binary with NEON enabled will run.

I'd also expect that for any production workload, a recent compiler can be expected on Grace systems. But most other architectures specified here also add some tuning flags for older GCC versions as well. So in order to stay consistent, I took the 'neoverse_n1' specification as a guideline.
For these older versions, I mainly relied on documentation only, so these flags are not necessarily the optimum, but represent an educated guess only.

alalazo

@fspiga Can you double check the few items that are on hold here? It would be good to have v2 support in the upcoming major release of Spack 🙂

dslarm · 2023-11-06T09:53:35Z

cpu/microarchitectures.json

+              },
+              {
+                  "versions": "8.0:8.99",
+                  "flags" : "-march=armv8.4-a+sve -mtune=cortex-a72 -msve-vector-bits=128"


The "-msve-vector-bits=128" should not be used in this entire PR. It is making vector-length specific (VLS) SVE which can only be guaranteed to run correctly if the platform has that SVE vector length. We need to produce binaries compiled on V2 that would not risk wrong behaviour on (eg) V1 where VL is 256 in SVE.

There is the warning in the GCC manual:

[..] At present ‘-msve-vector-bits=128’ also generates vector-length agnostic output for big-endian targets. All other values generate vector-length specific code. The behavior of these values may change in future releases and no value except ‘scalable’ should be relied on for producing code that is portable across different hardware SVE vector lengths.

Changes requested:

Remove any reference to -msve-vector-length - it's not needed.

For compilers with the V2 cost model (gcc 13, ACFL (arm) 23.04.0 and above) - replace whole line with "-mcpu=neoverse-v2". There is no need to specify "mtune" and "march" as "mcpu" sets both on aarch64 for Clang, ACfL, and GNU.

This PR for N1/V1 : #81 is the (simpler) pattern for compilers that do support a particular core.

I don't quite understand the argument for having comparability with V1. These architectures differ in their supported instruction sets, so the incompatibility goes beyond the sve vector width. But this flag is no longer included now due to the comments below.

* For compilers with the V2 cost model (gcc 13, ACFL (arm) 23.04.0 and above) - replace whole line with "-mcpu=neoverse-v2". There is no need to specify "mtune" and "march" as "mcpu" sets both on aarch64 for Clang, ACfL, and GNU.

The mcpu flag is used whenever possible for the neoverse V2 architecture with all compilers already. Only in older compiler versions like the one you singled out, the maximum supported arm architecture flag subset is used. So I'm not sure I understand your comment here, since this is very similar to the N1/V1 specifications you are pointing to.

willlovett-arm · 2023-11-06T14:29:59Z

Hi all,

(Simon - hi! I'm technology manager for the compiler teams at Arm).

The documentation of both GCC and CLANG state that the default is 'scalable'. I could not very if this is included in the mcpu option, so I've kept it when possible. I'll remove it if you can verify that it is redundant.

Correct - this example https://godbolt.org/z/a7a3En6TT shows that in practice.

Can I check: do we have a good reason for enabling width-specific flags here? If it's for a good reason (eg. we've explicitly written width-specific library code), then it's fine. If it's for performance reasons (eg. the compiler gets to assume another thing, so it should be faster, right?...), I'd be wary. We do almost all our optimization work on width-agnistic codegen. I'd go so far as to say: if you did have any examples where you saw a speed advantage with width-specific, please let me know, so we can put effort into fixing them.

I recognise that, in this particular case, this framework knows it's only ever building for neoverse-v2, so it's probably a moot point. But note that we (Arm) are keen to proliferate vector-length-agnostic code wherever possible, to reduce the impact of future vector length incompatibility.

Thanks!

Will.

alalazo · 2023-11-06T15:48:15Z

@willlovett-arm @dslarm Can you suggest changesets to this PR (I assume they amount to removing -msve-vector-bits=128)? I'll accept those and get this PR merged. Then we could discuss if any improvement is needed wrt flags, but I'd like to get neoverse-v2 support in the next Spack release which is happening early this week...

AdhocMan · 2023-11-06T15:57:54Z

Can I check: do we have a good reason for enabling width-specific flags here? If it's for a good reason (eg. we've explicitly written width-specific library code), then it's fine. If it's for performance reasons (eg. the compiler gets to assume another thing, so it should be faster, right?...), I'd be wary. We do almost all our optimization work on width-agnistic codegen. I'd go so far as to say: if you did have any examples where you saw a speed advantage with width-specific, please let me know, so we can put effort into fixing them.

I recognise that, in this particular case, this framework knows it's only ever building for neoverse-v2, so it's probably a moot point. But note that we (Arm) are keen to proliferate vector-length-agnostic code wherever possible, to reduce the impact of future vector length incompatibility.

Thanks!

Will.

Hi Will, thanks for providing feedback on this.

I think the fact that your compiler optimization work targets width-agnostic codegen is a convincing argument for removing the fixed sve vector length. The intention was to provide as much optimization opportunities as possible for the compiler. One of our applications could benefit from fixed vector width, but that should not translate into (potentially) worse optimization in general.

alalazo · 2023-11-06T16:18:52Z

Thanks @AdhocMan and everyone involved in the discussion!

AdhocMan added 2 commits October 16, 2023 19:47

add neoverse_v2

b6c6182

add target test file

4de8019

alalazo self-assigned this Oct 28, 2023

alalazo added the enhancement New feature or request label Oct 28, 2023

add arm compiler

0a84464

alalazo reviewed Nov 6, 2023

View reviewed changes

dslarm suggested changes Nov 6, 2023

View reviewed changes

remove -msve-vector-bits=128 flag

6a019df

alalazo approved these changes Nov 6, 2023

View reviewed changes

alalazo merged commit d844bb3 into archspec:master Nov 6, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Neoverse V2 and Armv9 #79

Add Neoverse V2 and Armv9 #79

AdhocMan commented Oct 17, 2023

alalazo commented Oct 28, 2023

fspiga commented Oct 30, 2023

AdhocMan commented Oct 30, 2023

alalazo left a comment

dslarm Nov 6, 2023

AdhocMan Nov 6, 2023

willlovett-arm commented Nov 6, 2023

alalazo commented Nov 6, 2023 •

edited

Loading

AdhocMan commented Nov 6, 2023

alalazo commented Nov 6, 2023

Add Neoverse V2 and Armv9 #79

Add Neoverse V2 and Armv9 #79

Conversation

AdhocMan commented Oct 17, 2023

alalazo commented Oct 28, 2023

fspiga commented Oct 30, 2023

AdhocMan commented Oct 30, 2023

alalazo left a comment

Choose a reason for hiding this comment

dslarm Nov 6, 2023

Choose a reason for hiding this comment

AdhocMan Nov 6, 2023

Choose a reason for hiding this comment

willlovett-arm commented Nov 6, 2023

alalazo commented Nov 6, 2023 • edited Loading

AdhocMan commented Nov 6, 2023

alalazo commented Nov 6, 2023

alalazo commented Nov 6, 2023 •

edited

Loading