Skip to content

Commit

Permalink
[MLAS][AArch64] SQNBitGemm CompInt8 - Use 4x2 tiles (#21380)
Browse files Browse the repository at this point in the history
Update SQNBitGemm ARM NEON kernel to compute 4x2 tile of output.

Note: Also tried 2x4 and 4x4 tiles but observed the best microbenchmark results with 4x2 tiles.
  • Loading branch information
edgchen1 authored Jul 18, 2024
1 parent 92f66de commit 05fc0c6
Showing 1 changed file with 244 additions and 158 deletions.
Loading

0 comments on commit 05fc0c6

Please sign in to comment.