-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NBD server losing connection to PostgreSQL #154
Comments
Thanks for the report! What error message(s) or warnings are you getting with your patch installed? Adding |
Session where it fails and then recovers even with pre-ping enabled:
|
I saw some commits with relevant changes pushed to master, can this be considered resolved? And is there a release/tag I can try? |
Turns out, the problem was my TCP proxy (ingress-nginx) killing the tcp-proxied PSQL connections after 600s (default stream-timeout value), it works perfectly when the ingress is taken out of the equation or when the timeout is increased. However, when such timeouts happen, it seems the |
Describe the bug
When using
benji nbd
with configuration pointing to a PostgreSQL pool, the NBD server stops initiating new client connections after some period of inactivity.The following errors can be found in the process output:
After that, all new connection requests cause the server to show something like that:
ERROR: [<IP>:44590] NBD_CMD_READ: Can't reconnect until invalid transaction is rolled back. (Background on this error at: https://sqlalche.me/e/14/8s2b)
Traceback (most recent call last):
This behaviour persists until the NBD server is restarted.
To Reproduce
Configure the following databaseEngine:
Try to connect to the NBD server using any client, then close the connection or keep it idle for 10-15 minutes (so that the server doesn't get any new requests), then try to connect again.
Expected behavior
NBD server works reliably without restarting it after every connection drop.
Platform and versions (please complete the following information):
bitnami/postgresql-ha@10.0.9
helm chart for the databaseAdditional context
Based on the suggestions here, I managed to make it self-recover in such cases by hotpatching this line in site-packages:
Still, this sometimes requires the client to initiate the connection 2+ times for the server to recover and actually accept it, but this more or less fixes the problem.
Please let me know if I you need any more details or if I can help in any way.
The text was updated successfully, but these errors were encountered: