You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've encountered a chaining problem when looking into how publishing-api tries to put things onto the rabbitMQ, which we've traced: Publishing API -> Content Store -> Router API -> Router
The problem seems to be that router-api cannot find a server:
To replicate:
➜ router-api git:(main) govuk-docker-run bundle exec rails c
docker-compose -f [...] run router-api-lite bundle exec rails c
Creating govuk-docker_router-api-lite_run ... done
Loading development environment (Rails 6.0.3.7)
irb(main):001:0> Route.count
Traceback (most recent call last):
1: from (irb):1
Mongo::Error::NoServerAvailable (No primary server is available in cluster: #<Cluster topology=Unknown[mongo-2.6:27017] servers=[#<Server address=mongo-2.6:27017 GHOST>]> with timeout=30, LT=0.015)
The mongo container does run, and you can watch logs though it is in a big loop of opening and closing connections punctuated by the following failry suspect message:
2021-12-03T16:06:57.809+0000 [rsStart] warning: getaddrinfo("48703775aaf0") failed: Name or service not known
2021-12-03T16:06:57.846+0000 [rsStart] getaddrinfo("48703775aaf0") failed: Name or service not known
2021-12-03T16:06:57.846+0000 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.
@kevindew spotted that if we comment out this line things start working again.
That seems to have been introduced during work to to resolve differences in how rs.status responds between mongo v.2.6 (which router runs in prod) and more modern versions.
Question to answer: what was L46 trying to resolve? Does it still serve that purpose? Can we replace it with something doesn't block local dev, or remove it altogether?
The text was updated successfully, but these errors were encountered:
L46 is necessary as we have been running MongoDB as a replica set since around April 2021, in order to enable the app to be replatformed. Previously, Router API knew about all running Router instances and would, upon a request to update a route, update said route and then call the /reload endpoint on each and every Router instance in order to ensure each instance's routes were up-to-date.
Replatforming changed this behaviour so that instead of Router API needing to know about individual Router instances (hardcoded instances, which was not translatable into the Kubernetes world into which we're now moving), Router instances would instead poll MongoDB for any new changes every few seconds; the way that we enabled this was through the use of a replica set and the db.stats() method to determine whether an instance has an up-to-date copy of the current routes from MongoDB by comparing the current optime to it's cached optime and reloading if changes have occurred.
We've encountered a chaining problem when looking into how publishing-api tries to put things onto the rabbitMQ, which we've traced:
Publishing API -> Content Store -> Router API -> Router
The problem seems to be that router-api cannot find a server:
To replicate:
The mongo container does run, and you can watch logs though it is in a big loop of opening and closing connections punctuated by the following failry suspect message:
@kevindew spotted that if we comment out this line things start working again.
That seems to have been introduced during work to to resolve differences in how rs.status responds between mongo v.2.6 (which router runs in prod) and more modern versions.
#499
This may have been an attempt to resolve this issue: alphagov/router#210
Question to answer: what was L46 trying to resolve? Does it still serve that purpose? Can we replace it with something doesn't block local dev, or remove it altogether?
The text was updated successfully, but these errors were encountered: