Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokio::signals::windows - are windows signals correctly handled? #7039

Open
Astlaan opened this issue Dec 16, 2024 · 7 comments
Open

Tokio::signals::windows - are windows signals correctly handled? #7039

Astlaan opened this issue Dec 16, 2024 · 7 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-signal Module: tokio/signal

Comments

@Astlaan
Copy link

Astlaan commented Dec 16, 2024

Version
v1.42.0

Platform
Windows

Description
The issue was been discussed in another tokio-dependent crate.
Full discussion here: Finomnis/tokio-graceful-shutdown#94 (comment)

In tokio, in this section:

unsafe extern "system" fn handler(ty: u32) -> BOOL {

unsafe extern "system" fn handler(ty: u32) -> BOOL {
    let globals = globals();
    globals.record_event(ty as EventId);

    // According to https://docs.microsoft.com/en-us/windows/console/handlerroutine
    // the handler routine is always invoked in a new thread, thus we don't
    // have the same restrictions as in Unix signal handlers, meaning we can
    // go ahead and perform the broadcast here.
    if globals.broadcast() {
        1
    } else {
        // No one is listening for this notification any more
        // let the OS fire the next (possibly the default) handler.
        0
    }
}

The HandlerRoutine is will defined. It later gets registered via the SetConsoleCtrlHandler. This gets done in the tokio code here.

According to:
HandlerRoutine reference

It seems that if the handler returns 1, then the console understands this as the handler having handled the graceful closure of the program, and windows terminates the program. If 0 is passed, "next handler function in the list of handlers for this process is used". SetConsoleCtrlHandler reference:

When a console process receives any of the control signals, its handler functions are called on a last-registered, first-called basis until one of the handlers returns TRUE. If none of the handlers returns TRUE, the default handler is called.

But in the tokio-defined handler function, broadcast() is called, which "Returns true if an event was delivered to at least one listener". But then broadcast() evaluates to true, the handler function returns true, telling windows that the defined handler handled the signal.

  • If we are dealing with a CtrlC signal, I guess this should be fine. The handler informs Windows that the signal has been handled, and windows doesn't call the default handler, which would terminate the program.
  • However, if we are dealing with a with something like a CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, or CTRL_SHUTDOWN_EVENT, according to the HandlerRoutine reference, Windows immediately terminates the process once a true return is received (unlike CTRL_Cevents

It seems that the way tokio implements it doesn't even give a change for the program to do anything with the signal in the case of CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, or CTRL_SHUTDOWN_EVENT events, since it broadcasts it to the listeners, and if they exist, the handler function returns a true, resulting in Windows terminating the program right away?

Or am I misinterpreting something in the Tokio code?

How to fix?

If I interpreted the Tokio mechanism correctly, then I guess the way to fix would be something like making the handler function wait for the program to complete.

One of the limiting factors is that, when the console emits CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, or CTRL_SHUTDOWN_EVENT, Windows grants a limited time for the program to clean itself: Timeouts.

The CTRL_CLOSE_EVENT timeout is 5 s, for example. If the program is not done cleaning after these 5 seconds, it will be forcefully terminated.

Possible solutions:

  1. Make the handler return false always, telling the user to register another handler function, with the cleanup he wants to do. Also tell the user that his handler function must be defined BEFORE the Tokio signal is registered, if the signal is to be of any use as well (due to the last-registered, first-called).
  2. Put a sleep on Tokio's handler function, so that that function never returns, and the program always has the full timeout time to clean after itself. Sleep works because the handler is run in another thread, so wouldn't affect the main thread.
  3. Some smarter method where the handler stops on some synchronization flag, that then can be switched by the main thread when the cleanup is done, and then lets the handler function proceed and finalize.

EDIT: possibly it is a good idea to always return False, regardless of the fix above. This will allow the user to keep defining custom CtrlHandler functions to be long alongside tokio's, if he so wishes (with the ordering warning mentioned in (1). This would invalidate approach (2) however, which won't give time for other handlers to run.

(1) is probably the most hacky. (2) would integrate the most seamlessly, however, preventing the user from running their own ctrl handlers. (3) is probably the most solid, but would imply having a synchronization variable, and some kind of action from the program that it is done cleaning itself. This would however result in code that is not cross-platform in the sense that this synchronization is not needed with Unix signals.

@Astlaan Astlaan added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Dec 16, 2024
@Darksonn Darksonn added the M-signal Module: tokio/signal label Dec 16, 2024
@Darksonn
Copy link
Contributor

There was a previous issue on this: #6735.

cc @ipetkov

@ipetkov
Copy link
Member

ipetkov commented Dec 17, 2024

Put a sleep on Tokio's handler function, so that that function never returns, and the program always has the full timeout time to clean after itself. Sleep works because the handler is run in another thread, so wouldn't affect the main thread.

I don't think this design would be sufficient: whatever sleep time we choose, there will always be applications for whom the time is either too long or too short (and making it configurable would also be a pain to use).

Some smarter method where the handler stops on some synchronization flag, that then can be switched by the main thread when the cleanup is done, and then lets the handler function proceed and finalize.

Perhaps the best approach would be to:

  • Deprecate the current APIs
  • Introduce v2 versions of the APIs which block the handler until the event object received through the channel is dropped. Then a consumer can control exactly when it is safe to bring the process down

Though the change would be subtle enough (e.g. easy to drop the event "handle" if it doesn't have any obvious use) that I think it warrants a new API rather than trying to change the existing ones and cause confusion

@Astlaan
Copy link
Author

Astlaan commented Dec 17, 2024

@ipetkov

I don't think this design would be sufficient: whatever sleep time we choose, there will always be applications for whom the time is either too long or too short (and making it configurable would also be a pain to use).

Well at most Windows grants 5s. Even if you make a 5s sleep on the handler, when you click the X button the console still closes immediately and the process keeps running solely on the background. So for a user perspective I think there wouldn't be much difference. The user expects the console to close right away and that's still what happens, regardless of blocking the handler for 5s or not. But I agree that it is not the most elegant solution, and would likely prevent the developer from choosing to define their own handlers via winapi, which probably is not good policy.

Perhaps the best approach would be to:
Deprecate the current APIs
Introduce v2 versions of the APIs which block the handler until the event object received through the channel is dropped. Then a consumer can control exactly when it is safe to bring the process down
Though the change would be subtle enough (e.g. easy to drop the event "handle" if it doesn't have any obvious use) that I think it warrants a new API rather than trying to change the existing ones and cause confusion

But are you proposing OS-specific methods, or a cross platform one? On one hand it would be nice to have a cross-platform method. On the other, that would require slightly complicating the procedure for Unix users, since they had no handler they had to drop in the first place, due to the different mechanism in Unix.

@ipetkov
Copy link
Member

ipetkov commented Dec 19, 2024

But are you proposing OS-specific methods, or a cross platform one?

OS (Windows) specific methods. As in we should mark the current versions of tokio::signal::windows::* as deprecated and introduce new ones that are designed for giving the caller control on when the process goes down.

I would avoid trying to design a cross-platform API since the Windows and Unix designs are just too different. If we ever come up with an elegant cross-platform solution we can always introduce it later, but trying to fit it all up front might not lead to success

@Finomnis
Copy link
Contributor

Finomnis commented Dec 20, 2024

which block the handler until the event object received through the channel is dropped.

I think I could work with this. (maintainer of tokio-graceful-shutdown)
It will require a rewrite of the OS signal code in my crate, but it feels like it should work.

@Finomnis
Copy link
Contributor

Finomnis commented Dec 21, 2024

@Darksonn After reading through this again, I think that a wait wouldn't be bad, actually. I would even go so far as to say that a wait(forever) would be a good idea; assuming that the thread gets automatically terminated at the end of main(). Then, the signal would never be resolved, and the actual response of the program would be to shut down. That would align with the behavior in Linux.

Again, if the thread gets reliably terminated after the main function exited. Not sure if there's still issues where exiting main does not do that properly; I remember some situations where that was a problem.

@Darksonn
Copy link
Contributor

Seems like a plausible solution for the existing function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-signal Module: tokio/signal
Projects
None yet
Development

No branches or pull requests

4 participants