Full SME(1) instruction support and STREAMING Groups #415

FinnWilkinson · 2024-06-12T10:50:19Z

This PR implements all available SME (version 1) instructions that are contained within LLVM 14.0.5. Specifically, this is Version 2021-06 of the Armv9-A A64 ISA.

No FP16 or BF16 instructions have been supported due to lacking C++17 types. All Quad-Word instruction variants have been emulated using 64-bit data-types.

In addition to this, new STREAMING_SVE and STREAMING_PREDICATE groups have been introduced (along with corresponding decode logic) to allow for a different pipeline / latency configuration for these instructions when SVE Streaming Mode (the context mode which SME instructions are executed in) is enabled. This can allow for a co-processor style implementation of SME to be implemented within SimEng; with additional latency / reduced throughput being configured to mimic an offload penalty, and different execution or LD/STR hardware being modelled for said co-processor compared to the main core.

Add STREAMING Group support
Add execution logic and regression tests for all missing SME instructions

FinnWilkinson · 2024-07-09T11:18:52Z

#rerun tests

…e instruction group to STREAMING if SM mode is different to when instruction was first decoded.

…config file.

…on tests.

…sion tests.

…ssion test (B, H, S, D)

…ion test (B, H, S, D)

…n alias and regression tests (B, H, S, D)

…uctions and aliases and regression tests (B, H, S, D)

… tests.

…regression tests.

…ests.

ABenC377 · 2024-08-29T09:57:54Z

src/include/simeng/arch/aarch64/Architecture.hh

+  bool isStreamingModeEnabled() const;
+
+  /** Returns if the SME ZA Register is enabled. */
+  bool isZA_RegisterEnabled() const;


Is the underscore in ZA_Register needed/canonical? Looks a bit weird jumping between camel and snake cases

ABenC377 · 2024-08-29T10:06:12Z

src/lib/arch/aarch64/Architecture.cc

+bool Architecture::isStreamingModeEnabled() const { return SVCRval_ & 1; }
+
+// 1st bit of SVCR register determines if ZA register is enabled.
+bool Architecture::isZA_RegisterEnabled() const { return SVCRval_ & 2; }


Same comment about underscore

ABenC377 · 2024-08-29T10:15:13Z

src/include/simeng/arch/aarch64/InstructionGroups.hh

-const uint16_t STORE_SME = 85;
-const uint16_t ALL = 86;
-const uint16_t NONE = 87;
+const uint16_t STREAMING_SVE = 66;


When I looked at this file in isolation, I assumed that the ordering of these was arbitrary. However, it seems to be important for how instruction.cc works. So I'd be in favour of adding a comment here to explain that the ordering of these instructions is important and can't be changed without looking at instruction.cc.

ABenC377 · 2024-08-29T10:22:16Z

src/lib/arch/aarch64/Instruction_address.cc

@@ -91,6 +91,19 @@ span<const memory::MemoryAccessTarget> Instruction::generateAddresses() {
        setMemoryAddresses({{sourceValues_[2].get<uint64_t>(), 8}});
        break;
      }
+      case Opcode::AArch64_LD1_MXIPXX_V_B:    // ld1b {zatv.b[ws, #imm]}, pg/z,


Should we be using "[[fallthrough]];" here as well? Apply same to other non-attributed fallthroughs below

Functionally I think it's fine but it provides a better structure and possibly improved readability so I'd say we should use fallthroughs

ABenC377 · 2024-08-30T09:40:23Z

src/lib/arch/aarch64/Instruction_execute.cc

+          uint64_t out[32] = {0};
+          std::memcpy(out, zaRow, rowCount * sizeof(uint64_t));
+          // Slice element is active IFF:
+          //  - Element in 1st source pred corresponding to horiz. slice is TRUE


Add an AND between these two statements

ABenC377 · 2024-08-30T10:41:47Z

test/regression/aarch64/instructions/sme.cc

+      CHECK_MAT_COL(ARM64_REG_ZAS3, i, uint32_t,
+                    fillNeon<uint32_t>(inter32, (SVL / 8)));
+    } else {
+      // Even cols, all elements


Should this be Odd cols?

Several occurrences of the possible same issue throughout

jj16791

Some comments and I agree with several of Alex's comments. I think it would be good to get the ARM SME/SVE loops as part of our functional verification checks to help test these new instructions. I assume it would have to be done somewhere private though (not sure if we already have that guarantee in the upcoming CI/CD pipelines)?

jj16791 · 2024-10-26T09:26:07Z

CMakeLists.txt

@@ -72,7 +72,7 @@ set(CMAKE_MACOSX_RPATH 1)
 set(CMAKE_POSITION_INDEPENDENT_CODE ON)

 # Create variable to enable additional compiler warnings for SimEng targets only
-set(SIMENG_COMPILE_OPTIONS -Wall -pedantic -Werror) #-Wextra


Why was this removed?

jj16791 · 2024-10-26T09:37:44Z

configs/a64fx_SME.yaml

@@ -62,7 +62,12 @@ Ports:
    Instruction-Group-Support:
    - INT_SIMPLE
    - INT_MUL
-    - STORE_DATA
+    - STORE_DATA_INT


Aren't these expansions redundant due to the inheritance mappings defined?

jj16791 · 2024-10-26T09:39:45Z

docs/sphinx/assets/instruction_groups_AArch64.png

LOAD_FP/STORE_ADDRESS_FP/STORE_DATA_FP don't exist. Also, this may be GitHub not showing it correctly, but the background should just be the alpha channel.

jj16791 · 2024-10-26T09:41:58Z

src/lib/arch/aarch64/Instruction_address.cc

@@ -91,6 +91,19 @@ span<const memory::MemoryAccessTarget> Instruction::generateAddresses() {
        setMemoryAddresses({{sourceValues_[2].get<uint64_t>(), 8}});
        break;
      }
+      case Opcode::AArch64_LD1_MXIPXX_V_B:    // ld1b {zatv.b[ws, #imm]}, pg/z,


Functionally I think it's fine but it provides a better structure and possibly improved readability so I'd say we should use fallthroughs

jj16791 · 2024-10-26T09:45:19Z

src/lib/arch/aarch64/Instruction.cc

+    instructionGroup_ = smEnabled ? InstructionGroups::STREAMING_PREDICATE
+                                  : InstructionGroups::PREDICATE;
+  } else if (isInstruction(InsnType::isSVEData)) {
+    assert(((instructionGroup_ >= InstructionGroups::SVE &&


Was there a particular reason why there's an assert for this conditional body but not the one above? Is there a possible edge case or something?

jj16791 · 2024-10-26T09:49:51Z

src/lib/arch/aarch64/Architecture.cc

@@ -188,6 +188,20 @@ uint8_t Architecture::predecode(const uint8_t* ptr, uint16_t bytesAvailable,
    newInsn.setExecutionInfo(getExecutionInfo(newInsn));
    // Cache the instruction
    iter = decodeCache_.insert({insn, newInsn}).first;
+  } else {


I have no data on this but doing this process for every AArch64 instruction may have a detrimental effect on performance. Will need to run through the new CI/CD when it's ready to determine this. Would argue that such a change shouldn't be merged until we know there's no significant performance regression.

jj16791 · 2024-10-26T09:52:34Z

src/lib/arch/aarch64/Instruction_decode.cc

+  } else if (isInstruction(InsnType::isShift))
+    group += 2;
+  else
+    group += 3;  // Default is {Data type}_SIMPLE_ARTH


Default is {Data type}_SIMPLE_ARTH_NOSHIFT

jj16791 · 2024-10-26T10:06:51Z

src/include/simeng/arch/aarch64/helpers/neon.hh

@@ -568,9 +568,14 @@ RegisterValue vecUMaxP(srcValContainer& sourceValues) {
  const T* n = sourceValues[0].getAsVector<T>();
  const T* m = sourceValues[1].getAsVector<T>();

+  // Concatenate the vectors


Have you double-checked the ordering of the concatenation? Ran it on ookami and I think these may be the wrong way round but worth double checking in case I've made a mistake

jj16791 · 2024-10-26T10:07:58Z

src/lib/arch/aarch64/Instruction_execute.cc

+          uint64_t out[32] = {0};
+          std::memcpy(out, zaRow, rowCount * sizeof(uint64_t));
+          // Slice element is active IFF:
+          //  - Element in 1st source pred corresponding to horiz. slice is TRUE


I don't think we need to short-hand horizontal.

jj16791 · 2024-10-26T10:18:55Z

test/regression/aarch64/instructions/sme.cc

+      CHECK_MAT_COL(ARM64_REG_ZAS3, i, uint32_t,
+                    fillNeon<uint32_t>(inter32, (SVL / 8)));
+    } else {
+      // Even cols, all elements


Several occurrences of the possible same issue throughout

FinnWilkinson added enhancement New feature or request 0.9.7 Part of SimEng Release 0.9.7 labels Jun 12, 2024

FinnWilkinson self-assigned this Jun 12, 2024

FinnWilkinson added 12 commits August 9, 2024 16:58

Added STREAMING versions of relevant aarch64 instruction groups.

201fc1e

Removed un-used macros from AArch64 Instruction decode.

fa95a72

Moved aarch64 getGroup logic to instruction_decode.

8f6051a

Moved riscv getGroup logic to instruction_decode.

989c994

Updated unit tests after changing getGroup logic.

801a47d

Added new AArch64 groups to model config and updated integration test.

2d4b07b

Added streaming mode enabled helper functions.

2523db9

Added STREAMING group logic to instruction_decode, and logic to chang…

2b299e7

…e instruction group to STREAMING if SM mode is different to when instruction was first decoded.

Fixed minor issues with new streaming groups and updated SME example …

fcd1594

…config file.

Re-wrote checkStreamingGroup function.

c710e78

Added unit tests for new AArch64 STREAMING groups functionality.

5aacba5

Updated aarch64 groups diagram in docs.

7974237

FinnWilkinson force-pushed the additional-sme-support branch from 531ebd0 to 7974237 Compare August 9, 2024 15:58

FinnWilkinson added 13 commits August 13, 2024 11:29

Added SME instruction FMOPS (S and D) support and regression tests.

0ebddc1

Added SME instruction SMOPA (S and D) support and regression tests.

958d686

Added SME instruction SMOPS (S and D) support and regression tests.

eacc73f

Added SME instructions UMOPA and UMOPS (S and D) support and regressi…

2fe1ea6

…on tests.

Fix jenkins build error.

aa5e01b

Added SME instructions SUMOPA and SUMOPS (S and D) support and regres…

16c0ce4

…sion tests.

Updated SUMOPA and SUMOPS tests.

d73a4a3

Added SME instructions USMOPA and USMOPS (S and D) support and regres…

a423a3e

…sion tests.

Fix jenkins build error pt2.

500dd5b

Implemented SME STR instruction and regression test.

95459c5

Fixed execution logic for vertical ST1D and ST1W SME stores.

8a1ca0b

Implemented SME ST1B and ST1H (H and V) instruction logic.

a897e3d

Implemented SME LD1B and LD1H (H and V) instruction logic.

87ead9b

FinnWilkinson added 14 commits August 15, 2024 14:04

Added SME LD1B and LD1H regression tests.

8a91683

Updated ST1D and ST1W SME regression tests.

fe74782

Added SME ST1B and ST1H regression tests.

34d5287

Implemented SME MOVA (Tile to Vec, horizontal) instructions and regre…

5b7626f

…ssion test (B, H, S, D)

Implemented SME MOVA (Tile to Vec, vertical) instructions and regress…

c91f7cc

…ion test (B, H, S, D)

Implemented SME MOV (Tile to Vec, vertical and horizontal) instructio…

2c3aedf

…n alias and regression tests (B, H, S, D)

Implemented SME MOVA/MOV (Vec to Tile, vertical and horizontal) instr…

cda97ba

…uctions and aliases and regression tests (B, H, S, D)

Implemented SME LDR instruction and regression tests.

e5fecac

Implemented SME ADDHA and ADDVA (S and D) instructions and regression…

f64e251

… tests.

Updated ADDHA test to make more specific.

e65a3f1

Corrected ADDVA execution logic.

17c7ac5

Updated ADDVA test to make more specific.

8c0fbe1

Added SME MOVA (tile to vec, vec to tile) Quad-word instructions and …

85cb8f5

…regression tests.

Implemented SME ST1Q and LD1Q (V and H) instructions and regression t…

79d026d

…ests.

FinnWilkinson marked this pull request as ready for review August 28, 2024 13:41

FinnWilkinson requested review from dANW34V3R, jj16791, JosephMoore25 and ABenC377 August 28, 2024 13:41

ABenC377 requested changes Aug 30, 2024

View reviewed changes

Removed werror.

3426968

FinnWilkinson changed the title ~~[WIP] Full SME(1) instruction support and STREAMING Groups~~ Full SME(1) instruction support and STREAMING Groups Sep 2, 2024

FinnWilkinson added 2 commits September 3, 2024 14:35

Added cstdint header to Register.hh

d5c631c

NEON instruction logic fixes.

4ad3b6e

jj16791 requested changes Oct 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full SME(1) instruction support and STREAMING Groups #415

Full SME(1) instruction support and STREAMING Groups #415

FinnWilkinson commented Jun 12, 2024 •

edited

Loading

FinnWilkinson commented Jul 9, 2024

ABenC377 Aug 29, 2024

ABenC377 Aug 29, 2024

ABenC377 Aug 29, 2024

ABenC377 Aug 29, 2024

jj16791 Oct 26, 2024

ABenC377 Aug 30, 2024

ABenC377 Aug 30, 2024

jj16791 Oct 26, 2024

jj16791 left a comment

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

jj16791 Oct 26, 2024

Full SME(1) instruction support and STREAMING Groups #415

Are you sure you want to change the base?

Full SME(1) instruction support and STREAMING Groups #415

Conversation

FinnWilkinson commented Jun 12, 2024 • edited Loading

FinnWilkinson commented Jul 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jj16791 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FinnWilkinson commented Jun 12, 2024 •

edited

Loading