Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random occasional SubscriptionWithCancelException$NoMoreElementsNeeded after migrating from Akka to Pekko #422

Open
Tvaroh opened this issue Jan 16, 2024 · 15 comments
Milestone

Comments

@Tvaroh
Copy link

Tvaroh commented Jan 16, 2024

Hey, I notice that when I migrate to Pekko (form latest "free" Akka), I start getting occasional org.apache.pekko.stream.SubscriptionWithCancelException$NoMoreElementsNeeded$ (No error message supplied) from web clients.

I've increased stream-cancellation-delay to 1s but still it fails sometimes. It does really look like something changed from Akka to Pekko.

I don't really have any code to reproduce as it's not consistent, maybe once in couple of days. But reverting back to Akka does fix it.

@He-Pin
Copy link
Member

He-Pin commented Jan 16, 2024

Do you mean it works after you revert back to akka 2.6.20?

@Tvaroh
Copy link
Author

Tvaroh commented Jan 16, 2024

@He-Pin yep.

@He-Pin
Copy link
Member

He-Pin commented Jan 16, 2024

Could you please share a simple reproducer, thanks @Tvaroh

@Tvaroh
Copy link
Author

Tvaroh commented Jan 17, 2024

As I said, it's inconsistent and happens sporadically. So I don't have a reproducer, but in the end it calls Http().singleRequest(request) which fails.

@jrudolph
Copy link
Contributor

I don't know about any change which could have caused this issue. Can you still validate that the old version based on Akka is still fine (to rule out a coincidence where something else in the wider environment might have changed in the meantime)?

Your best bet is trying to collect debug log message from org.apache.pekko.http.impl.engine.client.pool.NewHostConnectionPool to isolate the issue.

Since the issue indeed looks like it could be caused by the infamous stream cancellation race condition, you could try setting stream-cancellation-delay to much higher values to rule out any race-conditions that might still occur in a 1 second time frame caused by GC or other interruptions.

What kind of requests run into that problem? Usually, idempotent requests are retried a few times, which would make it even more unlikely to happen. So, this might also be an issue of unfortunate timing, where an non-idempotent request (e.g. POST) is run at exactly the same time as the server decides to close the connection (however, that scenario is reported in an unclear way caused by stream cancellation). In that case, playing around with idle-timeout and host-connection-pool.keep-alive-timeout (see also its documentation in the reference.conf) might help mitigate the issue.

@Tvaroh
Copy link
Author

Tvaroh commented Jan 22, 2024

@jrudolph yeah it only happens for requests to a particular internal service written in (God forgive) C#, so this can be some weird corner case, but it def doesn't happen on Akka. I'll try your suggestions and will keep an eye on it. For now I've added some retries, and yeah - I've seen it fail on GET and POST at least. Thanks for the reply, I'll apply some config changes and will post back.

@djjorjinho
Copy link

@Tvaroh have you found the root cause or a permanent fix? I have a service going through the same issue after a Pekko migration (Scala 2.12, pekko-cluster-sharding, pekko-http-circe).

@Tvaroh
Copy link
Author

Tvaroh commented Aug 13, 2024

@djjorjinho it's mysterious but after adding retries those exceptions eventually went away by themselves, perhaps something changed externally, so I'm afraid I have no answer to that.

@djjorjinho
Copy link

@Tvaroh no worries, thanks for the reply.

Can you share if you made any changes to any config inside pekko.http.host-connection-pool or pekko.http.client, like stream-cancellation-delay, idle-timeout, keep-alive-timeout, max-connection-lifetime, etc.?

@Tvaroh
Copy link
Author

Tvaroh commented Aug 13, 2024

@djjorjinho good point, sure, here what we ended up with:

pekko {
  loggers = ["org.apache.pekko.event.slf4j.Slf4jLogger"]
  loglevel = "DEBUG"

  http {
    client {
      stream-cancellation-delay = 2s

      parsing {
        max-content-length = 128m
      }
    }

    server {
      stream-cancellation-delay = 2s
      request-timeout = 60s
    }

    parsing {
      max-uri-length = 8k
      max-header-value-length = 8k
      max-to-strict-bytes = 64m
    }

    routing {
      decode-max-size = 16m
    }

    host-connection-pool {
      max-open-requests = 64
      response-entity-subscription-timeout = 10.seconds
    }
  }
}

@JD557
Copy link

JD557 commented Sep 4, 2024

Just to add an extra note - this was a problem in Akka as well: akka/akka-http#3201

And, looking at the issue, I think it was never fixed. The only PR referencing that issue was by @mdedetrich, which was not merged (due to conflicts with maintaining Pekko) and, by the title, sounds more like it was a workaround.

@mdedetrich
Copy link
Contributor

So I can still finish off the PR in Pekko as it appears that I basically didn't finish off the PR (changing the value caused some tests to fail and I needed to do adjustments to fix those tests)

@mdedetrich
Copy link
Contributor

mdedetrich commented Sep 4, 2024

PR made at #590, still need to fix the tests still they don't expect such a long stream-cancellation-delay

@mdedetrich
Copy link
Contributor

mdedetrich commented Sep 6, 2024

With Pekko being forked off of Akka 2.6.x it seems that we can now solve this properly i.e.

IIRC, the whole reason to introduce the delay was to deal with the fact that stream cancellation could not be well-configured in Akka 2.5. With 2.6, it's not 100% clear if the delay is even still needed or whether it could be solved in a better way (i.e. by resolving the cancellation race more intelligently, taking the cancellation reason into account at the right/more places) to avoid symptoms as shown in the ticket.

Originally posted by @jrudolph in #590 (comment)

@pjfanning Is it worth it to try and solve this properly before the 1.1.0 release?

@pjfanning pjfanning added this to the 1.1.1 milestone Sep 21, 2024
@pjfanning
Copy link
Contributor

#590 has been reverted (#597). We can revisit this for v1.1.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants