-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AP_Math: prevent FPE in SITL when limitting accel, vectors #28981
AP_Math: prevent FPE in SITL when limitting accel, vectors #28981
Conversation
Shouldn't be this done in safe_sqrt ? I would assume that we have some other place that can have this issue |
Well... we really want to know about where this actually does happen. This PR isn't really a satisfactory solution to what we're seeing IMO - it would be nice to trace back the maths to work out how the rhs ens up larger than the lhs in here. #28969 is the issue if you missed it |
I made a Geogebra to visualise the maths. https://www.geogebra.org/calculator/ktxy2uyf I don't see any issue with the maths. Must be floating point stuff. If I were to guess it would be Maybe we need a new method |
This comment was marked as outdated.
This comment was marked as outdated.
Pete is close, the problem is imprecision in The vector is
I'm not sure how to guarantee the vector length quantity is actually <= the limit after the math in |
Doing some research and fiddling suggests this is the correct solution. We need to audit other uses of It's not feasible due to the vagaries of floating point math to guarantee it's always equal or under. |
We could limit to the smaller of the the existing and the squared so we always conservative. On the whole I tend to think know we know what is going on we probably don't need to panic. I'd even be tempted not to fix this case so we can see if a recent change had made this more likely. We do have a test for the function, maybe we should add some cases there. ardupilot/libraries/AP_Math/tests/test_vector2.cpp Lines 125 to 131 in 5726c03
|
Closes #28969 |
I don't actually think And it has happened again here: https://github.com/ArduPilot/ardupilot/actions/runs/12572260649/job/35044375709?pr=28857 |
6d764d3
to
a73e086
Compare
I've written a test for this now so other people can play with things in gdb easily. I've also pushed up a patch reverting the original fix and proposing an alternative fix. Basically stop using the apples comparison to determine if we do the oranges maths - use a single fruit in both cases. |
Well, that's coincidence... third is presumed to be enemy action... I've had a quick look through the stack and code changes and haven't found any smoking gun. @robertlong13 changed things up in this test a bit recently be adding in the ICE frames. Is it possible that the slightly different mission is causing this? 3982d57 |
So this both fixes the problem and doesn't. The problem is fixed because we won't execute a square root with a negative number; we now trust that the limit function says it wanted to limit the vector and then set But it doesn't fix the problem because the vector is still in some sense too long. Calling I think adding a The current fix is sort of worse because now
Yes, I expect so. It just happens to sometimes hit this exact point now. Here's the enemy action: https://github.com/ArduPilot/ardupilot/actions/runs/12581405966/job/35065761383?pr=28991 . The numbers are again identical. |
Another failure: https://github.com/ArduPilot/ardupilot/actions/runs/12591859248/job/35097592628?pr=28996 Going to try to replicate locally and see if that mission change was the instigator. Still prefer the |
I was able to replicate it (running only the FlyEachFrame test and quadplane-tilt frame) but sometimes it took 50-60 tries and sometimes it was one or two. It would be hard to say what particular commit did it. I imagine it's kind of dependent on system load and stuff too. |
Now it's infected copter: https://github.com/ArduPilot/ardupilot/actions/runs/12616584539/job/35158079748?pr=28857
|
We're probably going to need fire by the end of this. |
a73e086
to
f06ef62
Compare
I've popped off my alternative fix now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix looks good to me, thanks! Tested that the test fails when the fix is reverted.
Some super minor nits:
- never quite sure what "limiting accel, vectors" meant, that could be clearer
- squash last two commits into one as they are both needed together to fix the test
Co-authored-by:
tag needs to be the last line in the commit message, after two blank lines for GitHub to pick it up
f06ef62
to
db167a3
Compare
All fixed, thanks! |
db167a3
to
3b16f3f
Compare
I'd prefer to restore the original intention of safe_sqrt() which is what a negative input is not an error |
Looks like this is the same problem and used the same fix as proposed here: ardupilot/libraries/AC_AttitudeControl/AC_PosControl.cpp Lines 1415 to 1416 in 01345e5
|
3b16f3f
to
a97c03d
Compare
A little clever but should serve the purpose :) Checked the tests too, but CI will tell all. Should we backport? I don't think it's necessary, unless we broke the 4.6 CI. |
a97c03d
to
84e7344
Compare
The cross-product code can produce something slightly negative. Fix safe_sqrt to avoid barfing on that, as was originally intended, and clarify why it's being used. Co-authored-by: Leonard Hall <leonardthall@gmail.com> Co-authored-by: Thomas Watson <twatson52@icloud.com>
84e7344
to
5f420e7
Compare
if (isnan(ret)) { | ||
return 0; | ||
// cast before checking so we sqrtf the same value we check | ||
const float val = static_cast<float>(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, we went through this entire discussion without noting that we're squashing any double which might be passed in....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had noticed that. But we do it in several places, I'm not too concerned specifically here. Also having looked at the binary I think it only ever gets called with a float even on double boards.
Co-authored-by: Leonard Hall leonardthall@gmail.com
the cross-product code can produce something slightly negative. Stop passing that into safe_sqrt; it isn't that safe on SITL!