You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ML-DSA] AVX2 performance improvements in NTT #584 - [ ] Things to investigate (increasing order of effort) - [ ] try different instructions for shuffling (e.g. vpshufd vs vmovshdup) - [ ] additionally unroll layers 2 through 0 - [ ] use a different shuffling strategy altogether (i.e. instead of shuffle-in -> butterfly -> shuffle-out, per layer in layers 2 - 0, shuffle once in layers 5-3 and unshuffle when writing out the final result (?))
Apply effective optimizations to inverse NTT #657 - Puzzle, potential waste of time: In our multiplication, subtractions seem to be disproportionately expensive although they shouldn't be, going from instruction count. Why is that?
The text was updated successfully, but these errors were encountered:
We found that playing around with different shufflings made essentially no difference, maybe with the potential exception of keeping vectors in the NTT domain in a shuffled state to avoid some shufflings altogether. This would require to touch all places where NTT domain vectors are handled and is probably not worth it in terms of performance, since there are easier changes to be made still, e.g. applying the butterfly optimization to the inverse NTT.
- [ ] Things to investigate (increasing order of effort)- [ ] try different instructions for shuffling (e.g.vpshufd
vsvmovshdup
)- [ ] additionally unroll layers 2 through 0- [ ] use a different shuffling strategy altogether (i.e. instead of shuffle-in -> butterfly -> shuffle-out, per layer in layers 2 - 0, shuffle once in layers 5-3 and unshuffle when writing out the final result (?))- Puzzle, potential waste of time: In our multiplication, subtractions seem to be disproportionately expensive although they shouldn't be, going from instruction count. Why is that?The text was updated successfully, but these errors were encountered: