-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add back-off controller for sleep time of reconnection when connection lost is detected immediately after connecting. #589 #625
Conversation
The PR looks useful but does not appear to address the related issue. You PR allows provides more flexibility re the delay between unsuccessful connections but the issue related to where the connection is successful but is dropped soon after being made due to an invalid publish, another client with the same ID etc. Do you want to try and address the larger issue or just go with this smaller change? |
MattBrittan, Thanks for your comment. I wasn't enough to check issue. I'll try to reproduce other situation and work on that again. If it's too difficult for me, I may push smaller modification. |
f458ea1
to
2749ad4
Compare
@MattBrittan Summary
Implementation pointbackoffController type has mutex in view of thread safe. I thought reconnect method may be called from multi threads. According to the implementation below, internalConnLost method is expected to be stopped (return), when it's called by multi thread. But both the errors errConnLossWhileDisconnecting and errAlreadyHandlingConnectionLoss seem to be not returned anywhere although they are defined. So it may be that "c.status.ConnectionLost" (L515) doesn't return any error and internalConnLost continue even if that is call by secondary thread. Note that internalConnLost method may be called in the bellow. I was wondering where to embed a store for keeping statuses of continual connection error (in types client or connectionStatus ?), but there seems to be no appropriate place. So I made backoffCotroller as a new type and added it to client type. |
… lost is detected immediately after connecting. eclipse-paho#589 Signed-off-by: Daichi Tomaru <banaoa7543@gmail.com>
2749ad4
to
d174b9a
Compare
Sorry about the delay in getting to this. Your PR looks OK to me (and all tests pass) so I'm going to accept it (it's a bit more complicated than I'd expected but I can see why you have made the decisions you did).
You are correct - however this is not a big deal because the only real impact is that an extra error is logged (assuming logging is enabled). The check for these errors was really just to prevent logging of thinbgs that are expected to occur as part of normal operations. I believe the situations I was trying to stop logging for are here and here. |
@MattBrittan |
Hi @MattBrittan any plan for next release soon that includes this PR? Thanks |
@fsudaman - sorry things have been pretty stable so I had not considered a release. It's been eight months so may as well! - Done |
@MattBrittan we had some issues with huge amount of retries causing a bit of congestion at scale so this really help. Thank you for the quick response and much appreciated. |
This MR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/eclipse/paho.mqtt.golang](https://github.com/eclipse/paho.mqtt.golang) | require | patch | `v1.4.1` -> `v1.4.3` | --- ### Release Notes <details> <summary>eclipse/paho.mqtt.golang (github.com/eclipse/paho.mqtt.golang)</summary> ### [`v1.4.3`](https://github.com/eclipse/paho.mqtt.golang/releases/tag/v1.4.3) [Compare Source](eclipse-paho/paho.mqtt.golang@v1.4.2...v1.4.3) Release 1.4.3 is a relatively small release to bring in changes made in the eight months since 1.4.2. Thanks to everyone who submitted issues and contributed code (list of the main merged pull requests below): #### What's Changed - Avoid Panic when keepalive is 1 by [@​tomatod](https://github.com/tomatod) in [#​622](eclipse-paho/paho.mqtt.golang#622) - Allow MQTT username/password in websocket URI [@​MattBrittan](https://github.com/MattBrittan) in [#​624](eclipse-paho/paho.mqtt.golang#624) - Add backoff when reconnecting following immediate connection loss [@​tomatod](https://github.com/tomatod) in [#​625](eclipse-paho/paho.mqtt.golang#625) - Update dependencies (github.com/gorilla/websocket@v1.5.0, golang.org/x/net, golang.org/x/sync) and specify `go 1.18` in `go.mod`. **Full Changelog**: eclipse-paho/paho.mqtt.golang@v1.4.2...v1.4.3 ### [`v1.4.2`](https://github.com/eclipse/paho.mqtt.golang/releases/tag/v1.4.2) [Compare Source](eclipse-paho/paho.mqtt.golang@v1.4.1...v1.4.2) Release 1.4.2 is relatively small and is mostly focused on tidying up the way the library manages the connection status. Previously `sync/ atomic` was used to read/update the status but this led to a range of potential deadlocks, and workarounds to avoid these, which made the code difficult to follow. The new [connectionStatus](https://github.com/eclipse/paho.mqtt.golang/blob/master/status.go#L113) separates status handling from `client` and should simplify further development whilst resolving potential race conditions. It is my hope that users will not notice any change ([@​master](https://github.com/master) was updated on 10th August and the updated code has been running in production at a few sites since then without issue). A further change is that it is now possible to disable auto acknowledgment so that received messages can be manually acknowledged (or, more to the point, not acknowledged!). Thanks to everyone who submitted issues and contributed code (list of the main merged pull requests below): #### What's Changed - Tidy up use of mutex in `messageIds` by [@​MattBrittan](https://github.com/MattBrittan) in [#​602](eclipse-paho/paho.mqtt.golang#602) - Resolve situation where broker accepted connection but did not respond to CONNECT packet in a timely manner (should be very unusual but was reported in [#​597](eclipse-paho/paho.mqtt.golang#597)). [@​MattBrittan](https://github.com/MattBrittan) in [#​603](eclipse-paho/paho.mqtt.golang#603) - Resolve race condition in test by [@​MattBrittan](https://github.com/MattBrittan) in [#​606](eclipse-paho/paho.mqtt.golang#606) - Re-architect status handling by [@​MattBrittan](https://github.com/MattBrittan) in [#​607](eclipse-paho/paho.mqtt.golang#607) - Enable manual ACK by [@​shivamkm07](https://github.com/shivamkm07) in [#​578](eclipse-paho/paho.mqtt.golang#578) #### New Contributors - [@​shivamkm07](https://github.com/shivamkm07) made their first contribution in [#​578](eclipse-paho/paho.mqtt.golang#578) **Full Changelog**: eclipse-paho/paho.mqtt.golang@v1.4.1...v1.4.2 </details> --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yODYuMSIsInVwZGF0ZWRJblZlciI6IjM3LjI4Ni4xIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIiwibGFiZWxzIjpbXX0=--> See merge request alpine/infra/build-server-status!8
MaxConnectRetryInterval is added to ClientOptions related to #589.
The sleep time is increased progressively between connection attempts up to MaxConnectRetryInterval.
If ConnectRetryInterval >= MaxConnectRetryInterval, ConnectRetryInterval is kept between the attempts.