Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Option for reconnect Back-Off (to prevent continual reconnection attempts when connection closed by remote host) #589

Open
chess555 opened this issue Mar 17, 2022 · 4 comments

Comments

@chess555
Copy link

Due to the way connect/reconnects are handled, no delay is used when a remote host closes a connection.

The simplest example is publishing using a bad topic with QoS > 0
The client will store the bad message, and get stuck in a cycle of connect, publish, connection closed, reconnect, ...

Similar behavior has been noted when:
Connection is dropped due to duplicate clientids in a broker
Performing an operation (subscribe/publish) on an AWS data broker when your device lacks the policy settings, or has had policy permissions revoked.

@MattBrittan
Copy link
Contributor

I'm not sure that there is a solution to this unfortunately; as per the spec:

If a Server implementation does not authorize a PUBLISH to be performed by a Client; it has no way of informing that Client. It MUST either make a positive acknowledgement, according to the normal QoS rules, or close the Network Connection [MQTT-3.3.5-2].

So most 'well behaved' brokers will complete the handshake and throw the message away. While disconnecting is given as an option the client has no way to determine why the broker dropped the connection and, as such, cannot remove the offending message from it's store (the spec does not make any provision for 'rejected' messages).

The same applies in other situations (e.g. another client connects with the same client ID); the broker just drops the connection and we have no way to differentiate that from a network issue.

The above means that I don't think we really have any option other than to attempt to reconnect immediately and send through any queued messages. If you can suggest an alternative I'll definitely consider it (but most libraries use the same approach).

@chess555
Copy link
Author

Sorry for not being clear, but the only issue I'm pointing at is the reconnect frequency performed when a close event occurs. (The examples were just meant as a situation where this occurs)

When a remote host closes a connection in this manner, I'm seeing upwards of 50-100 connect/disconnect events per second, and am concerned about this behavior (particularly on constrained networks)

In my case, I was able to use the Reconnect handler to add a delay after receiving an an io.EOF event.

@MattBrittan MattBrittan changed the title Reconnect "Thrashing" when connection closed by remote host Add Option for Back-Off when connection lost (to prevent continual reconnection attempts when connection closed by remote host) Mar 17, 2022
@MattBrittan
Copy link
Contributor

No worries - I have amended the title so that it more clearly covers what I believe you are requesting.

The initial reconnection attempt needs to be immediate (because the issue may be a momentary network glitch) but I agree that continually attempting to reconnect is counter productive. Some form of Back-Off algorithm (reset after the connection has been up for more than a user specified time) would be beneficial.

@MattBrittan MattBrittan changed the title Add Option for Back-Off when connection lost (to prevent continual reconnection attempts when connection closed by remote host) Add Option for reconnect Back-Off when connection lost (to prevent continual reconnection attempts when connection closed by remote host) Mar 17, 2022
@MattBrittan MattBrittan changed the title Add Option for reconnect Back-Off when connection lost (to prevent continual reconnection attempts when connection closed by remote host) Add Option for reconnect Back-Off (to prevent continual reconnection attempts when connection closed by remote host) Mar 17, 2022
tomatod added a commit to tomatod/paho.mqtt.golang that referenced this issue Dec 26, 2022
…ackoff related to eclipse-paho#589

Signed-off-by: Daichi Tomaru <banaoa7543@gmail.com>
tomatod added a commit to tomatod/paho.mqtt.golang that referenced this issue Dec 27, 2022
…n lost is detected immediately after connecting. eclipse-paho#589

Signed-off-by: Daichi Tomaru <banaoa7543@gmail.com>
@tomatod
Copy link
Contributor

tomatod commented Dec 28, 2022

Summary of this issue (I think)

There seem to be at least 3 points should have appropriate sleep with back-off algorithm.

No Situation Presence of implementation Cause
1 Unsuccessful initial connection No Mere connection failure
2 Unsuccessful reconnection after connection lost Yes Mere connection failure
3 Connection lost immediately after successful reconnection No Unexpectedly disconnected immediately after connection

In this GitHub issue, the following are nowly reported for each points.

  • Trigger of No.1:
    • Connecting AWS broker without appropriate authority.
  • Trigger of No.3:
    • Duplicate clientids
    • Invalid publish

Cause of No.3

  1. When incomming loop recieves a error about connection lost, internalConnLost method is called.
    https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L673-L695

  2. Then, internalConnLost method called reconnect method.
    https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L560

  3. It doesn't take long to reconnect broker. Although network connection combacks, some brokers disconnect due to MQTT issues (Duplicate clientids, Invalid publish,...). Then, return to 1. No any back-off sleep is during this time.
    https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L314-L317

tomatod added a commit to tomatod/paho.mqtt.golang that referenced this issue Dec 31, 2022
… lost is detected immediately after connecting. eclipse-paho#589

Signed-off-by: Daichi Tomaru <banaoa7543@gmail.com>
MattBrittan added a commit that referenced this issue Jan 8, 2023
…reconnect loops

Add back-off controller for sleep time of reconnection  when connection lost is detected immediately after connecting. #589
This issue could be caused by an invalid publish request (which leads to the broker dropping the connection immediately).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants