-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full SME(1) instruction support and STREAMING Groups #415
base: dev
Are you sure you want to change the base?
Conversation
#rerun tests |
…e instruction group to STREAMING if SM mode is different to when instruction was first decoded.
531ebd0
to
7974237
Compare
…ssion test (B, H, S, D)
…ion test (B, H, S, D)
…n alias and regression tests (B, H, S, D)
…uctions and aliases and regression tests (B, H, S, D)
…regression tests.
bool isStreamingModeEnabled() const; | ||
|
||
/** Returns if the SME ZA Register is enabled. */ | ||
bool isZA_RegisterEnabled() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the underscore in ZA_Register needed/canonical? Looks a bit weird jumping between camel and snake cases
bool Architecture::isStreamingModeEnabled() const { return SVCRval_ & 1; } | ||
|
||
// 1st bit of SVCR register determines if ZA register is enabled. | ||
bool Architecture::isZA_RegisterEnabled() const { return SVCRval_ & 2; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about underscore
const uint16_t STORE_SME = 85; | ||
const uint16_t ALL = 86; | ||
const uint16_t NONE = 87; | ||
const uint16_t STREAMING_SVE = 66; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I looked at this file in isolation, I assumed that the ordering of these was arbitrary. However, it seems to be important for how instruction.cc works. So I'd be in favour of adding a comment here to explain that the ordering of these instructions is important and can't be changed without looking at instruction.cc.
@@ -91,6 +91,19 @@ span<const memory::MemoryAccessTarget> Instruction::generateAddresses() { | |||
setMemoryAddresses({{sourceValues_[2].get<uint64_t>(), 8}}); | |||
break; | |||
} | |||
case Opcode::AArch64_LD1_MXIPXX_V_B: // ld1b {zatv.b[ws, #imm]}, pg/z, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be using "[[fallthrough]];" here as well? Apply same to other non-attributed fallthroughs below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally I think it's fine but it provides a better structure and possibly improved readability so I'd say we should use fallthroughs
uint64_t out[32] = {0}; | ||
std::memcpy(out, zaRow, rowCount * sizeof(uint64_t)); | ||
// Slice element is active IFF: | ||
// - Element in 1st source pred corresponding to horiz. slice is TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an AND between these two statements
CHECK_MAT_COL(ARM64_REG_ZAS3, i, uint32_t, | ||
fillNeon<uint32_t>(inter32, (SVL / 8))); | ||
} else { | ||
// Even cols, all elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Odd cols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several occurrences of the possible same issue throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments and I agree with several of Alex's comments. I think it would be good to get the ARM SME/SVE loops as part of our functional verification checks to help test these new instructions. I assume it would have to be done somewhere private though (not sure if we already have that guarantee in the upcoming CI/CD pipelines)?
@@ -72,7 +72,7 @@ set(CMAKE_MACOSX_RPATH 1) | |||
set(CMAKE_POSITION_INDEPENDENT_CODE ON) | |||
|
|||
# Create variable to enable additional compiler warnings for SimEng targets only | |||
set(SIMENG_COMPILE_OPTIONS -Wall -pedantic -Werror) #-Wextra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
@@ -62,7 +62,12 @@ Ports: | |||
Instruction-Group-Support: | |||
- INT_SIMPLE | |||
- INT_MUL | |||
- STORE_DATA | |||
- STORE_DATA_INT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these expansions redundant due to the inheritance mappings defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOAD_FP/STORE_ADDRESS_FP/STORE_DATA_FP don't exist. Also, this may be GitHub not showing it correctly, but the background should just be the alpha channel.
@@ -91,6 +91,19 @@ span<const memory::MemoryAccessTarget> Instruction::generateAddresses() { | |||
setMemoryAddresses({{sourceValues_[2].get<uint64_t>(), 8}}); | |||
break; | |||
} | |||
case Opcode::AArch64_LD1_MXIPXX_V_B: // ld1b {zatv.b[ws, #imm]}, pg/z, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally I think it's fine but it provides a better structure and possibly improved readability so I'd say we should use fallthroughs
instructionGroup_ = smEnabled ? InstructionGroups::STREAMING_PREDICATE | ||
: InstructionGroups::PREDICATE; | ||
} else if (isInstruction(InsnType::isSVEData)) { | ||
assert(((instructionGroup_ >= InstructionGroups::SVE && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a particular reason why there's an assert for this conditional body but not the one above? Is there a possible edge case or something?
@@ -188,6 +188,20 @@ uint8_t Architecture::predecode(const uint8_t* ptr, uint16_t bytesAvailable, | |||
newInsn.setExecutionInfo(getExecutionInfo(newInsn)); | |||
// Cache the instruction | |||
iter = decodeCache_.insert({insn, newInsn}).first; | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no data on this but doing this process for every AArch64 instruction may have a detrimental effect on performance. Will need to run through the new CI/CD when it's ready to determine this. Would argue that such a change shouldn't be merged until we know there's no significant performance regression.
} else if (isInstruction(InsnType::isShift)) | ||
group += 2; | ||
else | ||
group += 3; // Default is {Data type}_SIMPLE_ARTH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is {Data type}_SIMPLE_ARTH_NOSHIFT
@@ -568,9 +568,14 @@ RegisterValue vecUMaxP(srcValContainer& sourceValues) { | |||
const T* n = sourceValues[0].getAsVector<T>(); | |||
const T* m = sourceValues[1].getAsVector<T>(); | |||
|
|||
// Concatenate the vectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you double-checked the ordering of the concatenation? Ran it on ookami and I think these may be the wrong way round but worth double checking in case I've made a mistake
uint64_t out[32] = {0}; | ||
std::memcpy(out, zaRow, rowCount * sizeof(uint64_t)); | ||
// Slice element is active IFF: | ||
// - Element in 1st source pred corresponding to horiz. slice is TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to short-hand horizontal.
CHECK_MAT_COL(ARM64_REG_ZAS3, i, uint32_t, | ||
fillNeon<uint32_t>(inter32, (SVL / 8))); | ||
} else { | ||
// Even cols, all elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several occurrences of the possible same issue throughout
This PR implements all available SME (version 1) instructions that are contained within LLVM 14.0.5. Specifically, this is Version 2021-06 of the Armv9-A A64 ISA.
No FP16 or BF16 instructions have been supported due to lacking C++17 types. All Quad-Word instruction variants have been emulated using 64-bit data-types.
In addition to this, new STREAMING_SVE and STREAMING_PREDICATE groups have been introduced (along with corresponding decode logic) to allow for a different pipeline / latency configuration for these instructions when SVE Streaming Mode (the context mode which SME instructions are executed in) is enabled. This can allow for a co-processor style implementation of SME to be implemented within SimEng; with additional latency / reduced throughput being configured to mimic an offload penalty, and different execution or LD/STR hardware being modelled for said co-processor compared to the main core.