Improve efficiency of NFFT direct transformation in one dimension. #142

jenskeiner · 2024-08-15T14:36:14Z

Small change to improve the efficiencvy of the direct NFFT trafo in one dimension:

Instead of using memset to initialize the target vector with zeros, it's better to just write the final value to the target in the outer loop. Rationale: Using memset will need to access the entire target vector another time. This can be costly if the vector is large and does not fit inside the CPU cache.
Instead of accessing the target location f[j] in the inner loop in each iteration, it may be better to accumulate the value in a local variable and write only the final value to f[j]. Rationale: May reduce potentially slow memory access to f[j], but if f[j] is in the CPU cache and/or the compiler is smart, this may not make any difference.
Instead of calling the complex expontential cexp to calculate e^{-i*omega}, use real-valued sin and cos functions. Rationale: cexp supports complex-valued arguments, but the actual argument is always purely imaginary.

It was difficult to test this one because there's not simple benchmark I could quickly run. Also, I tested this on arm64/v8 and cycle.h currently doesn't work for me. So I had to set up a scratch file to run a quick check which is not part of this PR. On my platform, the number of cycles for the direct transform drops to 60-80% compared to before.

Would be good if someone could test this separately and on a different architecture (e.g. amd64) as well.

…g real-valued sin/cos instead of complex valued cexp.

jenskeiner · 2024-08-15T14:37:12Z

tests/nfft.c

need the explicit casts or tests won't compile for me.

michaelquellmalz · 2024-08-16T11:19:21Z

Looks good. I also had about 20 % speed increase on my Intel 10700 CPU with GCC10 and time measured in Matlab.

I was wondering why you did the changes only in 1D, because these improvements should also work in the multivariate case as well as for the direct adjoint NFFT.

jenskeiner · 2024-08-16T12:42:07Z

I'll make more changes, for d > 1 and possibly the adjoint transform as well. I was just going to get it in step by step so everything can be reasonably tested. There's no rush to merge this PR from my side. But I don't know how fast I'll be able to get to make more changes, so it could make sense to merge this, and then I'll just open new PRs for the rest.

jenskeiner added 2 commits August 15, 2024 16:17

Add explicit cast to prevent function pointer type mismatch.

82575ec

Improve direct trafo efficiency by inlining memset operation and usin…

e996b2a

…g real-valued sin/cos instead of complex valued cexp.

jenskeiner added the enhancement label Aug 15, 2024

jenskeiner requested review from DanielPotts and skunis August 15, 2024 14:36

jenskeiner self-assigned this Aug 15, 2024

jenskeiner commented Aug 15, 2024

View reviewed changes

tests/nfft.c

Copy link

Contributor Author

jenskeiner Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need the explicit casts or tests won't compile for me.

michaelquellmalz added this to the 3.5.4 milestone Oct 15, 2024

michaelquellmalz merged commit f9e1bde into develop Oct 15, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve efficiency of NFFT direct transformation in one dimension. #142

Improve efficiency of NFFT direct transformation in one dimension. #142

jenskeiner commented Aug 15, 2024

jenskeiner Aug 15, 2024

michaelquellmalz commented Aug 16, 2024

jenskeiner commented Aug 16, 2024

Improve efficiency of NFFT direct transformation in one dimension. #142

Improve efficiency of NFFT direct transformation in one dimension. #142

Conversation

jenskeiner commented Aug 15, 2024

jenskeiner Aug 15, 2024

Choose a reason for hiding this comment

michaelquellmalz commented Aug 16, 2024

jenskeiner commented Aug 16, 2024