Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back-off controller for sleep time of reconnection when connection lost is detected immediately after connecting. #589 #625

Merged
merged 1 commit into from
Jan 8, 2023

Conversation

tomatod
Copy link
Contributor

@tomatod tomatod commented Dec 26, 2022

MaxConnectRetryInterval is added to ClientOptions related to #589.

The sleep time is increased progressively between connection attempts up to MaxConnectRetryInterval.
If ConnectRetryInterval >= MaxConnectRetryInterval, ConnectRetryInterval is kept between the attempts.

@MattBrittan
Copy link
Contributor

The PR looks useful but does not appear to address the related issue. You PR allows provides more flexibility re the delay between unsuccessful connections but the issue related to where the connection is successful but is dropped soon after being made due to an invalid publish, another client with the same ID etc. Do you want to try and address the larger issue or just go with this smaller change?

@tomatod
Copy link
Contributor Author

tomatod commented Dec 26, 2022

MattBrittan, Thanks for your comment. I wasn't enough to check issue. I'll try to reproduce other situation and work on that again. If it's too difficult for me, I may push smaller modification.

@tomatod tomatod force-pushed the connect_retry_backoff branch from f458ea1 to 2749ad4 Compare December 27, 2022 11:39
@tomatod tomatod changed the title Add MaxConnectRetryInterval client option for reconnect exponential backoff related to #589 Add back-off controller for sleep time of reconnection when connection lost is detected immediately after connecting. #589 Dec 27, 2022
@tomatod
Copy link
Contributor Author

tomatod commented Dec 28, 2022

@MattBrittan
I fixed PR, Could you review it again?

Summary

Implementation point

backoffController type has mutex in view of thread safe. I thought reconnect method may be called from multi threads.

According to the implementation below, internalConnLost method is expected to be stopped (return), when it's called by multi thread. But both the errors errConnLossWhileDisconnecting and errAlreadyHandlingConnectionLoss seem to be not returned anywhere although they are defined. So it may be that "c.status.ConnectionLost" (L515) doesn't return any error and internalConnLost continue even if that is call by secondary thread.
https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L515-L522

Note that internalConnLost method may be called in the bellow.
https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L683
https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L693
https://github.com/tomatod/paho.mqtt.golang/blob/master/ping.go#L73

I was wondering where to embed a store for keeping statuses of continual connection error (in types client or connectionStatus ?), but there seems to be no appropriate place. So I made backoffCotroller as a new type and added it to client type.

… lost is detected immediately after connecting. eclipse-paho#589

Signed-off-by: Daichi Tomaru <banaoa7543@gmail.com>
@tomatod tomatod force-pushed the connect_retry_backoff branch from 2749ad4 to d174b9a Compare December 31, 2022 11:57
@MattBrittan MattBrittan merged commit e3fa503 into eclipse-paho:master Jan 8, 2023
@MattBrittan
Copy link
Contributor

Sorry about the delay in getting to this. Your PR looks OK to me (and all tests pass) so I'm going to accept it (it's a bit more complicated than I'd expected but I can see why you have made the decisions you did).

reconnect method may be called from multi threads.
It should not be (the status code should prevent this but it's from the most recent change so it's possible I've missed something). However a mutex is worthwhile , the code seems pretty generic so may be useful elsewhere so it's better to make it thread safe (and its performance is not really of critical importance)..

But both the errors errConnLossWhileDisconnecting and errAlreadyHandlingConnectionLoss seem to be not returned anywhere although they are defined.

You are correct - however this is not a big deal because the only real impact is that an extra error is logged (assuming logging is enabled). The check for these errors was really just to prevent logging of thinbgs that are expected to occur as part of normal operations. I believe the situations I was trying to stop logging for are here and here.

@tomatod
Copy link
Contributor Author

tomatod commented Jan 10, 2023

@MattBrittan
Thank you for your review and merge!!

@fsudaman
Copy link

Hi @MattBrittan any plan for next release soon that includes this PR? Thanks

@MattBrittan
Copy link
Contributor

@fsudaman - sorry things have been pretty stable so I had not considered a release. It's been eight months so may as well! - Done

@fsudaman
Copy link

@MattBrittan we had some issues with huge amount of retries causing a bit of congestion at scale so this really help. Thank you for the quick response and much appreciated.

algitbot pushed a commit to alpinelinux/build-server-status that referenced this pull request May 5, 2024
This MR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/eclipse/paho.mqtt.golang](https://github.com/eclipse/paho.mqtt.golang) | require | patch | `v1.4.1` -> `v1.4.3` |

---

### Release Notes

<details>
<summary>eclipse/paho.mqtt.golang (github.com/eclipse/paho.mqtt.golang)</summary>

### [`v1.4.3`](https://github.com/eclipse/paho.mqtt.golang/releases/tag/v1.4.3)

[Compare Source](eclipse-paho/paho.mqtt.golang@v1.4.2...v1.4.3)

Release 1.4.3 is a relatively small release to bring in changes made in the eight months since 1.4.2.

Thanks to everyone who submitted issues and contributed code (list of the main merged pull requests below):

#### What's Changed

-   Avoid Panic when keepalive is 1 by [@&#8203;tomatod](https://github.com/tomatod) in [#&#8203;622](eclipse-paho/paho.mqtt.golang#622)
-   Allow MQTT username/password in websocket URI [@&#8203;MattBrittan](https://github.com/MattBrittan) in [#&#8203;624](eclipse-paho/paho.mqtt.golang#624)
-   Add backoff when reconnecting following immediate connection loss [@&#8203;tomatod](https://github.com/tomatod) in [#&#8203;625](eclipse-paho/paho.mqtt.golang#625)
-   Update dependencies (github.com/gorilla/websocket@v1.5.0, golang.org/x/net, golang.org/x/sync) and specify `go 1.18` in `go.mod`.

**Full Changelog**: eclipse-paho/paho.mqtt.golang@v1.4.2...v1.4.3

### [`v1.4.2`](https://github.com/eclipse/paho.mqtt.golang/releases/tag/v1.4.2)

[Compare Source](eclipse-paho/paho.mqtt.golang@v1.4.1...v1.4.2)

Release 1.4.2 is relatively small and is mostly focused on tidying up the way the library manages the connection status. Previously `sync/
atomic` was used to read/update the status but this led to a range of potential deadlocks, and workarounds to avoid these, which made the code difficult to follow. The new [connectionStatus](https://github.com/eclipse/paho.mqtt.golang/blob/master/status.go#L113) separates status handling from `client` and should simplify further development whilst resolving potential race conditions. It is my hope that users will not notice any change ([@&#8203;master](https://github.com/master) was updated on 10th August and the updated code has been running in production at a few sites since then without issue).

A further change is that it is now possible to disable auto acknowledgment so that received messages can be manually acknowledged (or, more to the point, not acknowledged!).

Thanks to everyone who submitted issues and contributed code (list of the main merged pull requests below):

#### What's Changed

-   Tidy up use of mutex in `messageIds` by [@&#8203;MattBrittan](https://github.com/MattBrittan) in [#&#8203;602](eclipse-paho/paho.mqtt.golang#602)
-   Resolve situation where broker accepted connection but did not respond to CONNECT packet in a timely manner (should be very unusual but was reported in [#&#8203;597](eclipse-paho/paho.mqtt.golang#597)).  [@&#8203;MattBrittan](https://github.com/MattBrittan) in [#&#8203;603](eclipse-paho/paho.mqtt.golang#603)
-   Resolve race condition in test  by [@&#8203;MattBrittan](https://github.com/MattBrittan) in [#&#8203;606](eclipse-paho/paho.mqtt.golang#606)
-   Re-architect status handling by [@&#8203;MattBrittan](https://github.com/MattBrittan) in [#&#8203;607](eclipse-paho/paho.mqtt.golang#607)
-   Enable manual ACK by [@&#8203;shivamkm07](https://github.com/shivamkm07) in [#&#8203;578](eclipse-paho/paho.mqtt.golang#578)

#### New Contributors

-   [@&#8203;shivamkm07](https://github.com/shivamkm07) made their first contribution in [#&#8203;578](eclipse-paho/paho.mqtt.golang#578)

**Full Changelog**: eclipse-paho/paho.mqtt.golang@v1.4.1...v1.4.2

</details>

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

&nbsp;
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yODYuMSIsInVwZGF0ZWRJblZlciI6IjM3LjI4Ni4xIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIiwibGFiZWxzIjpbXX0=-->

See merge request alpine/infra/build-server-status!8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants