-
-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSH and Ctrl-C leave attic serve instances alive and prevent further upload #545
Comments
Yes, I've seen your issue. I am not completely sure about it (and it might also depend on the ssh/sshd configuration a bit). I just tried borg create --ssh--> localhost repo and quickly interrupted it 5 times and it did not hang for me. So maybe try it, if it works for you, we could check one more attic issue from our list. |
Thank you for the reply, i'll try again with borg tomorrow and see if something change. |
Well, there are pipes and if the connection breaks down, they break. That should be dealt with within remote.py. |
I have been running a 8-day ssh-backup over a slow upload. Interrupted it multiple times by Ctrl+C, network dying or similar issues. It did happen a few times that the lock was not removed. borg break-lock fixed it. No other issues were encountered. |
Tested with borg 0.29.0 after a couple of CTRL-C i get stuck with
|
on the server with
|
The command used for uploading the content is
|
@ThomasWaldmann regarding the pipes breaking down i'm not quite sure they break if the connection drop down. I work with sockets and i learned the hard way the need for an explicit ping pong protocol |
@filcuc that is a leftover lock, there is another ticket that deals with that. borg break-lock can be used to remove that manually (when being sure that there should be no lock). ssh btw has also such a "ping" feature that could be used. |
maybe related: jborg/attic#130 jborg/attic#323 |
My offsite-backup is behind a firewall. The remote-backup-host had a script that auto-created a reverse ssh session which I was pushing the backups through. Just recently I realized that SSH relies on TCPKeepAlive to kill stale SSH tunnel… but the reverse SSH session kept running if my backup client had network issues or borg was killed some other way. Just one day ago I added: It could be very well not related to the original issue, but then again, I thought I'd share my findings. |
I solved the same way putting |
maybe we could close this one... |
If it's not in the docs it should probably go into the FAQ? |
@dragetd do you still think these 2 lines were helpful? If so, can you remove them again to see if the problem comes back? If we are sure about this being helpful, we need to think about specific values and then add to FAQ. There is already a somehow related ssh Q in there. |
Any news? |
I'ld like to close this issue soon. So if someone thinks something should be done about it (add stuff to docs, fix code), please speak up (and consider your recent past experience with up-to-date borg versions). |
Please excuse my lack of response, lost this one out of sight. Indeed, my hanging instance was related only to the SSH timeouts. With l longer timeouts, the process would be up longer. With a lock-wait long enough and short timeouts, my backup runs fine. From my side this can be closed as no borg-related. |
Yes, FAQ or some section about remote repos sounds good.
I just want to be careful: is there some scenario where such settings could be counter-productive? |
@dragetd yes, a PR against 1.0-maint branch would be welcome. How is this related to lock wait time? |
When lock wait time > SSH timeout then a broken connection is closed (BrokenPipeError in borg serve -> lock cleanup) before another Borg times out waiting for the lock. |
I still don't understand. If client C1 is doing a backup to repo server, a borg serve process S1 will be created dealing with the connection and holding a lock L1. The connection might run for a undefined time (e.g. 1h) before it runs into a connection issue. With the above settings, the broken down ssh connection will be terminated server-side after 1 minute and the lock L1 will get released. A client C2 could have been trying to create another backup since short after C1 started. It will have a related S2 borg serve process trying to get a lock L2 since almost 1h. Then S1 will terminate and release the lock L1 after the connection breakdown is handled. S2 will get the lock L2 if it waits long enough - longer than this 1h. So the lock wait time seems to be related with the undefined backup-runtime-until-a-problem happens rather than with the ssh parameters. If you expect multi-client serialized backups (like all waiting in a queue), the lock wait time needs to be >> expected backup duration of all other clients. |
Oh, I had a different (fully concurrent) scenario in mind. It helps for this scenario: If you have multiple borg create ... ; borg create ... in a serialized way in a single script, you need to give them --lock-wait N (with N a bit more than the time the server needs to terminate broken down connections and release the lock). |
Hi i've just filled this bug report for attic
I didn't test it with borg.
Do you know if this is still the case?
The text was updated successfully, but these errors were encountered: