Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRV_Channels: fix race for Lua getters #29021

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

robertlong13
Copy link
Collaborator

Almost all the Lua getters/setters use servo function instead of channel, which relies on these bitmasks that we rebuild once per second in update_aux_servo_function. If those masks are still being rebuilt when the getter is called, then it looks like no valid function is assigned.

I believe this race cannot happen on ChibiOS (update runs on the main thread and no getters get called from higher priority thread), so I've #if-ed those out on ChibiOS to save flash. This race occurs pretty reliably in SITL though. According to this reply I got on Discord, this can happen on ESP32 and Linux, even with the thread priorities in Linux, but I haven't been able to test those.

@robertlong13
Copy link
Collaborator Author

Minimal script to reproduce the issue and demonstrate the fix. If someone could test on ESP32 and Linux that would be great.

SERVO_FUNCTION = 33 -- Motor 1
-- SERVO_FUNCTION = 4 -- Aileron

local ever_had_pwm = false
local function update()
    local pwm = SRV_Channels:get_output_pwm(SERVO_FUNCTION)
    -- Try to get PWM 20 times to increase our odds of hitting the race condition
    for _ = 1, 20 do
        pwm = pwm and SRV_Channels:get_output_pwm(SERVO_FUNCTION)
    end
    -- Make sure we don't complain until we've gotten a non-nil PWM at least once
    if pwm then
        if not ever_had_pwm then
            print("Got a PWM")
        end
        ever_had_pwm = true
    end
    -- Complain about a nil PWM
    if not pwm and ever_had_pwm then
        print("PWM is nil")
    end
    return update, 0
end

print("loaded servo_nil_test.lua")

return update, 0

@robertlong13
Copy link
Collaborator Author

I considered some other approaches, but ultimately settled on the semaphore:

  1. Rebuild the masks as new masks, then copy them over, instead of clearing the masks in place.
  2. Stop rebuilding the masks periodically and instead set and clear each bit correctly as servo function assignments change. Should be possible, but easy to mess up.

@IamPete1
Copy link
Member

IamPete1 commented Jan 8, 2025

I think you could re-write the update update_aux_servo_function to be safe. This is what we tend to do in other places. You create a new bitmask in the function run the loop on it and then only at the end set the existing mask to the new value. That means the mask is never zero. The channel mask is a bit more complex because we probably don't want to make have 157 copy's, but you could do the loop over servo channels 157 times and just have one local copy. It will be quite a bit more computationally expensive, but its limited to the update_aux_servo_function which is called infrequently. Adding the semaphore into the other functions that check the masks means some small cost is also paid on functions that are called much more frequently.

@andyp1per
Copy link
Collaborator

I have some concerns over introduce a semaphore here. I haven't look at this in detail but these bitmasks are used by the motor output in a hot path that would be significantly impacted by a new semaphore. So we just need to be careful. Probably would need to do a --enable-stats before/after check for the CPU while using the fast rate output.

@robertlong13
Copy link
Collaborator Author

I have some concerns over introduce a semaphore here. I haven't look at this in detail but these bitmasks are used by the motor output in a hot path

Darn, I didn't realize that the high speed motor outputs relied on these. I thought most of the set-by-function-id users were slower paths.

See my alternative PR linked above where I build up new masks and copy them in instead of clearing. That one avoids semaphores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants