Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix undefined behaviour in dot product kernels #655

Merged
merged 3 commits into from
Oct 22, 2023

Conversation

argilo
Copy link
Member

@argilo argilo commented Oct 14, 2023

Fixes #651.
Fixes #659.

In tests, the dot product is a summation of 131071 terms, so a fair amount of error can accumulate. From the past CI failures it can be seen that the error can reach a magnitude of 2, so I've set a tolerance of 10 to leave a safe margin.

Since the summation can be close to zero, it makes sense to check the absolute error instead of the relative error. The ccompare function did not have an absolute mode, so I added one.

Update: The test failures and large errors identified in #651 were due to undefined behaviour in the dot product kernels, which sometimes manifested in trailing samples/taps not being included in the calculation. I've updated the pull request to include fixes for that problem.

@argilo argilo force-pushed the fix-flaky-dot-prod branch from 7c7b097 to 946259f Compare October 14, 2023 01:41
@argilo
Copy link
Member Author

argilo commented Oct 14, 2023

The test failure is unrelated; it's due to #646.

@argilo
Copy link
Member Author

argilo commented Oct 14, 2023

On my machine I can get away with an absolute tolerance of 1e-2 even if I change the order of floating point operations in the generic kernel by reversing the loop. So I'm confused why the errors have been so much larger in CI. There might be something more to the failures. Let's hold off on merging this until we know what's going on.

@jdemel
Copy link
Contributor

jdemel commented Oct 14, 2023

@argilo you could convert this PR into a draft. This would clearly indicate that it is still being worked on.

@argilo argilo marked this pull request as draft October 14, 2023 14:08
@argilo argilo force-pushed the fix-flaky-dot-prod branch 2 times, most recently from d2e8603 to 43d8980 Compare October 15, 2023 00:45
@argilo
Copy link
Member Author

argilo commented Oct 15, 2023

Aha! The volk_32fc_32f_dot_prod_32fc kernels misbehave because they cast a float* to lv_32fc_t* and then dereference it, which is undefined behaviour.

float res[2];
float *realpt = &res[0], *imagpt = &res[1];

*result = *(lv_32fc_t*)(&res[0]);

I got rid of the float array and replaced it with lv_32fc_t, following the model that the accumulator kernels use.

Tests now pass on all platforms, even with the tighter 1e-2 absolute tolerance.

@argilo argilo marked this pull request as ready for review October 15, 2023 01:01
@argilo
Copy link
Member Author

argilo commented Oct 15, 2023

The test failures are unrelated; both are due to #646.

@argilo
Copy link
Member Author

argilo commented Oct 15, 2023

volk_16i_32fc_dot_prod_32fc has the same bug, but I'll leave that one for later.

@argilo argilo changed the title Check absolute error of dot product calculation Fix undefined behaviour in dot product kernel Oct 15, 2023
@argilo
Copy link
Member Author

argilo commented Oct 15, 2023

GNU Radio uses both volk_32fc_32f_dot_prod_32fc and volk_16i_32fc_dot_prod_32fc for FIR filtering, so it will be good to get them cleaned up and tested with tighter tolerances.

@argilo
Copy link
Member Author

argilo commented Oct 15, 2023

volk_16i_32fc_dot_prod_32fc has the same bug, but I'll leave that one for later.

Actually, I'll throw it in here because it will also need the new absolute tolerance code.

@argilo argilo force-pushed the fix-flaky-dot-prod branch from 43d8980 to d4c8688 Compare October 15, 2023 02:11
@argilo argilo changed the title Fix undefined behaviour in dot product kernel Fix undefined behaviour in dot product kernels Oct 15, 2023
@argilo
Copy link
Member Author

argilo commented Oct 15, 2023

The test failure in the latest run is due to #650. At least there's some variety. 😄

@argilo
Copy link
Member Author

argilo commented Oct 15, 2023

Some information about why casting from float[] to std::complex is Undefined Behaviour:

https://stackoverflow.com/questions/23198943/is-it-legal-to-cast-float-to-stdcomplexfloat

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM. Only a very few comments. Thanks!

@@ -147,8 +147,7 @@ static inline void volk_16i_32fc_dot_prod_32fc_u_sse(lv_32fc_t* result,
unsigned int number = 0;
const unsigned int sixteenthPoints = num_points / 8;

float res[2];
float *realpt = &res[0], *imagpt = &res[1];
lv_32fc_t returnValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilers would potentially complain about uninitialized variables here.
I suggest: lv_32fc_t returnValue = lv_cmake(0.0f, 0.0f);
I'd update the other occurrences as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure compilers would be confused by this particular case, but it would make the code easier to understand for humans. I'll add initializations and change all the updates to +=.

lib/kernel_tests.h Show resolved Hide resolved
@argilo argilo force-pushed the fix-flaky-dot-prod branch 2 times, most recently from 794ca91 to 56fba31 Compare October 22, 2023 13:03
Signed-off-by: Clayton Smith <argilo@gmail.com>
Signed-off-by: Clayton Smith <argilo@gmail.com>
Signed-off-by: Clayton Smith <argilo@gmail.com>
@argilo argilo force-pushed the fix-flaky-dot-prod branch from 56fba31 to 221138a Compare October 22, 2023 13:08
@argilo
Copy link
Member Author

argilo commented Oct 22, 2023

Updated, and rebased on main. It might be a while until CI catches up. 😄

@argilo
Copy link
Member Author

argilo commented Oct 22, 2023

CI has now caught up. ✅

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jdemel jdemel merged commit 42f57cd into gnuradio:main Oct 22, 2023
32 checks passed
@argilo argilo deleted the fix-flaky-dot-prod branch October 22, 2023 16:17
Alesha72003 pushed a commit to Alesha72003/volk that referenced this pull request May 15, 2024
Fix undefined behaviour in dot product kernels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants