Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queue: infinite retries after reducing max_tries and bogus retry interval #678

Open
lukastribus opened this issue Feb 11, 2024 · 2 comments
Assignees
Labels
bug Something isn't working.

Comments

@lukastribus
Copy link

lukastribus commented Feb 11, 2024

Describe the bug

Likely to due a == comparison as opposed to >= , when a email in queue has a retry count higher than max_retries, it will be retried infinitely as the condition never matches:

if !temporary || meta.TriesCount[rcpt]+1 == q.maxTries {

A unrelated secondary issue is seen with the retry delay; which does not match the documented formula; in fact the first 4 retry delays are the same exact 14m59.99 (see logs below). I can file a separate issue if needed.

Steps to reproduce

This can happen in the following situation:

  • an email is send, while the configuration is "max_tries = 20"
  • email fails temporarily multiple times, email retry counter is now at 10
  • admin reduces max_tries to 8
  • email will not permfail anymore as the attemt counter is well above max_retries so a == comparison will not stop it anymore

Log files

root@srv1:/var/lib/maddy/remote_queue# journalctl -xeu maddy.service | grep "will retry" | grep ef97411f
Feb 09 16:52:40 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":1},"msg_id":"ef97411f","next_try_delay":"14m59.999996196s","rcpts":["support@example.net"]}
Feb 09 17:07:40 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":2},"msg_id":"ef97411f","next_try_delay":"14m59.99999822s","rcpts":["support@example.net"]}
Feb 09 17:22:40 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":3},"msg_id":"ef97411f","next_try_delay":"14m59.999997532s","rcpts":["support@example.net"]}
Feb 09 17:37:41 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":4},"msg_id":"ef97411f","next_try_delay":"14m59.999996182s","rcpts":["support@example.net"]}
Feb 09 17:52:41 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":5},"msg_id":"ef97411f","next_try_delay":"29m59.999995718s","rcpts":["support@example.net"]}
Feb 09 18:22:41 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":6},"msg_id":"ef97411f","next_try_delay":"44m59.999996344s","rcpts":["support@example.net"]}
Feb 09 19:07:41 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":7},"msg_id":"ef97411f","next_try_delay":"44m59.999996027s","rcpts":["support@example.net"]}
Feb 09 19:52:41 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":8},"msg_id":"ef97411f","next_try_delay":"59m59.999996174s","rcpts":["support@example.net"]}
Feb 09 20:52:42 srv1 maddy[2732258]: queue: will retry        {"attempts_count":{"support@example.net":9},"msg_id":"ef97411f","next_try_delay":"1h14m59.999997755s","rcpts":["support@example.net"]}
Feb 09 22:07:42 srv1 maddy[923]: queue: will retry        {"attempts_count":{"support@example.net":10},"msg_id":"ef97411f","next_try_delay":"1h44m59.999995644s","rcpts":["support@example.net"]}
Feb 09 23:52:42 srv1 maddy[923]: queue: will retry        {"attempts_count":{"support@example.net":11},"msg_id":"ef97411f","next_try_delay":"2h14m59.99999673s","rcpts":["support@example.net"]}
Feb 10 02:07:42 srv1 maddy[923]: queue: will retry        {"attempts_count":{"support@example.net":12},"msg_id":"ef97411f","next_try_delay":"2h44m59.99999787s","rcpts":["support@example.net"]}
Feb 10 04:52:42 srv1 maddy[923]: queue: will retry        {"attempts_count":{"support@example.net":13},"msg_id":"ef97411f","next_try_delay":"3h29m59.999997578s","rcpts":["support@example.net"]}
Feb 10 08:22:43 srv1 maddy[923]: queue: will retry        {"attempts_count":{"support@example.net":14},"msg_id":"ef97411f","next_try_delay":"4h29m59.999997642s","rcpts":["support@example.net"]}
Feb 10 12:52:43 srv1 maddy[41407]: queue: will retry        {"attempts_count":{"support@example.net":15},"msg_id":"ef97411f","next_try_delay":"5h29m59.999996433s","rcpts":["support@example.net"]}
Feb 10 18:22:43 srv1 maddy[41407]: queue: will retry        {"attempts_count":{"support@example.net":16},"msg_id":"ef97411f","next_try_delay":"6h59m59.999997628s","rcpts":["support@example.net"]}
Feb 11 01:22:43 srv1 maddy[41407]: queue: will retry        {"attempts_count":{"support@example.net":17},"msg_id":"ef97411f","next_try_delay":"8h44m59.999996512s","rcpts":["support@example.net"]}
Feb 11 10:07:43 srv1 maddy[41407]: queue: will retry        {"attempts_count":{"support@example.net":18},"msg_id":"ef97411f","next_try_delay":"10h59m59.999996882s","rcpts":["support@example.net"]}
root@srv1:/var/lib/maddy/remote_queue#

When maddy was restarted (with different configuration) can be seen by the changing PID.

Configuration file

target.queue remote_queue {
    max_tries 8
    target &outbound_delivery

Environment information

@lukastribus lukastribus added the bug Something isn't working. label Feb 11, 2024
foxcpp added a commit that referenced this issue Feb 12, 2024
@foxcpp
Copy link
Owner

foxcpp commented Feb 12, 2024

Pushed 2da4ece that changes the condition to use >=. Will look into what is wrong with delay calculation...

@foxcpp foxcpp self-assigned this Feb 12, 2024
@lukastribus
Copy link
Author

For the record, here's an output without any restarts in between and with the default of 20 retries. The To address ignorame2@messagenet.it is real (and unreachable, so you can reproduce; there is a TLS issue which makes it retry) :

lukas@srv1:~$ sudo journalctl -xeu maddy.service | grep -E "retry.+c5eb84cd|permanent.+c5eb84cd"
Feb 17 23:54:35 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":1},"msg_id":"c5eb84cd","next_try_delay":"14m59.99999903s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 00:09:36 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":2},"msg_id":"c5eb84cd","next_try_delay":"14m59.999997305s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 00:24:36 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":3},"msg_id":"c5eb84cd","next_try_delay":"14m59.999996852s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 00:39:36 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":4},"msg_id":"c5eb84cd","next_try_delay":"14m59.99999707s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 00:54:36 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":5},"msg_id":"c5eb84cd","next_try_delay":"29m59.999997265s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 01:24:37 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":6},"msg_id":"c5eb84cd","next_try_delay":"44m59.999996767s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 02:09:37 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":7},"msg_id":"c5eb84cd","next_try_delay":"44m59.999997176s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 02:54:37 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":8},"msg_id":"c5eb84cd","next_try_delay":"59m59.999996711s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 03:54:37 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":9},"msg_id":"c5eb84cd","next_try_delay":"1h14m59.999997516s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 05:09:38 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":10},"msg_id":"c5eb84cd","next_try_delay":"1h44m59.999997928s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 06:54:38 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":11},"msg_id":"c5eb84cd","next_try_delay":"2h14m59.999998559s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 09:09:38 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":12},"msg_id":"c5eb84cd","next_try_delay":"2h44m59.99995706s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 11:54:38 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":13},"msg_id":"c5eb84cd","next_try_delay":"3h29m59.999994638s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 15:24:38 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":14},"msg_id":"c5eb84cd","next_try_delay":"4h29m59.999997815s","rcpts":["ignorame2@messagenet.it"]}
Feb 18 19:54:39 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":15},"msg_id":"c5eb84cd","next_try_delay":"5h29m59.999998277s","rcpts":["ignorame2@messagenet.it"]}
Feb 19 01:24:39 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":16},"msg_id":"c5eb84cd","next_try_delay":"6h59m59.999996306s","rcpts":["ignorame2@messagenet.it"]}
Feb 19 08:24:39 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":17},"msg_id":"c5eb84cd","next_try_delay":"8h44m59.999997053s","rcpts":["ignorame2@messagenet.it"]}
Feb 19 17:09:39 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":18},"msg_id":"c5eb84cd","next_try_delay":"10h59m59.999998151s","rcpts":["ignorame2@messagenet.it"]}
Feb 20 04:09:40 srv1 maddy[501510]: queue: will retry        {"attempts_count":{"ignorame2@messagenet.it":19},"msg_id":"c5eb84cd","next_try_delay":"13h44m59.999995836s","rcpts":["ignorame2@messagenet.it"]}
Feb 20 17:54:40 srv1 maddy[501510]: queue: not delivered, permanent error        {"msg_id":"c5eb84cd","rcpt":"ignorame2@messagenet.it"}
lukas@srv1:~$

I did notice the the documentations talks about a factor of 1.2 while the code has 1.25, but clearly there is more to this than a simple disparity of the factor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

2 participants