-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Workers setup with nginx
WIP
The actual documentation for setting up workers is not really easy to follow :
- https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
- https://github.com/matrix-org/synapse/blob/master/docs/workers.md
This is how I (try) to change my setup for using workers. WARNING : SHOULD BE REVIEWED ! WIP ! Actually this breaks my setup in a lot of strange ways Look at the issues below first
I expect you have already a working synapse configuration. Not putting whole config files here
- My setup is having around 400 users. mostly around 300 concurrent connections on day time. 4500 local rooms. Some big federated rooms too.
- Server is running in a VMware with 16 CPU and 32GB RAM (half of it for postgreSQL).
- DB is 14GB big
- nginx is used as a reverse proxy
- Synapse homeserver process is hammering with 100-120%CPU all day long, but never uses more of the CPUs.
- my nginx graph gives an average of 140 requests/s in working hours
- I'm using the debian packages of matrix.org and starting matrix with systemd
First, I wanted to check what endpoints are asked the most in my installation. I grepped the endpoints of every worker as described in https://github.com/matrix-org/synapse/blob/master/docs/workers.md in my nginx access log for 24 hours:
grep -E '(/_matrix/client/(v2_alpha|r0)/sync|/_matrix/client/(api/v1|v2_alpha|r0)/events|/_matrix/client/(api/v1|r0)/initialSync|/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync)' |wc -l
grep -E '(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised|/_matrix/key/v2/query|/_matrix/federation/v1/groups/)'
grep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)' root@mort:/var/log/nginx# zgrep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'
grep -E '(/_matrix/client/(api/v1|r0|unstable)/publicRooms|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/joined_members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/context/.*|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state|/_matrix/client/(api/v1|r0|unstable)/login|/_matrix/client/(api/v1|r0|unstable)/account/3pid|/_matrix/client/(api/v1|r0|unstable)/keys/query|/_matrix/client/(api/v1|r0|unstable)/keys/changes|/_matrix/client/versions|/_matrix/client/(api/v1|r0|unstable)/voip/turnServer|/_matrix/client/(api/v1|r0|unstable)/joined_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups/|/_matrix/client/(api/v1|r0|unstable)/pushrules/.*|/_matrix/client/(api/v1|r0|unstable)/groups/.*|/_matrix/client/(r0|unstable)/register|/_matrix/client/(r0|unstable)/auth/.*/fallback/web)'
Note : I didn't included /_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages
)
175576 (without /messages)
9998816 (with /messages not sure why)
grep -E '/_matrix/client/(api/v1|r0|unstable)/user_directory/search'
grep -E '/_matrix/client/(api/v1|r0|unstable)/keys/upload'
grep -E '(/_matrix/client/(api/v1|r0|unstable)/rooms/.*/send|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state/|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)|/_matrix/client/(api/v1|r0|unstable)/join/|/_matrix/client/(api/v1|r0|unstable)/profile/)'
worker’s endpoints | request/day | percent |
---|---|---|
synchrotron | 9017921 | 90.19% |
federation_reader | 321413 | 3.21% |
media_repository | 115749 | 1.16% |
client_reader | 175576 | 1.76% |
user_dir | 1341 | 0.01% |
frontend_proxy | 6936 | 0.07% |
event_creator | 26876 | 0.27% |
total | 9665812 | 96.67% |
total requests | 9998816 | 100.00% |
others | 333004 | 3.33% |
So the synchrotron would make the most of sense for me (since I think my setup is standard, I guess it's almost always like this)
WARNING : I broke parts of my setup a lot while trying to do it on a live server.
Just add this in the existing listeners part of the config
listeners:
# The TCP replication port
- port: 9092
bind_address: '127.0.0.1'
type: replication
# The HTTP replication port
- port: 9093
bind_address: '127.0.0.1'
type: http
resources:
- names: [replication]
Also hat this to homeserver.yaml
worker_app: synapse.app.homeserver daemonize: false
restart your synapse to check it's still working
# systemctl restart matrix-synapse
Note : if you work as root, take care of giving the config files to matrix-synapse user after creating them
I used the systemd instructions from here https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers. But I changed it to be able to start multiple synchrotron workers.
mkdir /etc/matrix-synapse/workers
worker_app: synapse.app.synchrotron
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8083
resources:
- names:
- client
worker_daemonize: False
worker_pid_file: /var/run/synchrotron1.pid
worker_log_config: /etc/matrix-synapse/synchrotron1-log.yaml
send_federation: False
If you want to run multiple synchrotron, create other config like this sed -e 's/synchrotron1/sychrotron2/g' -e 's/8083/8084' /etc/matrix-synapse/workers/synchrotron1.yaml > /etc/matrix-synapse/workers/synchrotron2.yaml
Don't forget to create log config files as weel for each worker.
This process should produce the logfile /var/log/matrix-synapse/synchrotron1.log It may possibly be reduced...
version: 1
formatters:
precise:
format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s'
filters:
context:
(): synapse.util.logcontext.LoggingContextFilter
request: ""
handlers:
file:
class: logging.handlers.RotatingFileHandler
formatter: precise
filename: /var/log/matrix-synapse/synchrotron1.log
maxBytes: 104857600
backupCount: 10
filters: [context]
encoding: utf8
level: DEBUG
console:
class: logging.StreamHandler
formatter: precise
level: WARN
loggers:
synapse:
level: WARN
synapse.storage.SQL:
level: INFO
synapse.app.synchrotron:
level: DEBUG
root:
level: WARN
handlers: [file, console]
I tried to start the worker with synctl but I had to change the config to include /etc/matrix-synapse/conf.d/* in it cause it wasn't reading them. Since I use systemd to start it in production, it's better to set up workers to start with systemd directly
Followed this : https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
And created an extra systemd service to be able to have multiple synchrotrons.
/etc/systemd/system/matrix-synapse-worker-synchrotron@.service
[Unit]
Description=Synapse Matrix Worker
After=matrix-synapse.service
BindsTo=matrix-synapse.service
[Service]
Type=notify
NotifyAccess=main
User=matrix-synapse
WorkingDirectory=/var/lib/matrix-synapse
EnvironmentFile=/etc/default/matrix-synapse
ExecStart=/opt/venvs/matrix-synapse/bin/python -m synapse.app.synchrotron --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/synchrotron-%i.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
SyslogIdentifier=matrix-synapse-synchrotron-%i
[Install]
WantedBy=matrix-synapse.service
- Reload the systemd config :
systemctl daemon-reload
- start synchrotron1 :
systemctl start matrix-synapse-worker-synchrotron@1.service
- check the logs :
journal -xe -f -u matrix-synapse-worker-synchrotron@1.service
If this worked, you should have now an extra python process for synchrotron1. But it doesn't handle any traffic yet.
add this to your default_server somewhere in server { }
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow ::1;
deny all;
}
you can then get some ideas of the requests you get with
$ curl http://127.0.0.1/nginx_status
Active connections: 270
server accepts handled requests
172758 172758 3500311
Reading: 0 Writing: 126 Waiting: 144
First, I set up a pool for the synchrotrons (look at the ports configured in the workers). This way, I could theoricaly scale out when there is too much load. I also added a log format to be able to trace in nginx which worker is handling which request (stolen somewhere I don't remember) :
Place this in your nginx config (I put it in my vhost config outside of server {}
)
log_format backend '$remote_addr - $remote_user - [$time_local] $upstream_addr: $request $status URT:$upstream_response_time request_time $request_time';
upstream synchrotron {
# ip_hash; # this might help in some cases, not in mine
# server 127.0.0.1:8008; # main synapse process, to roll back when it goes wrong (reacted strangely)
server 127.0.0.1:8083; # synchrotron1
# server 127.0.0.1:8084; # synchrotron2
# server 127.0.0.1:8085; # synchrotron3
}
Then, you can change the default log format of your vhost :
server {
#[...]
access_log /var/log/nginx/matrix-access.log backend;
#[...]
}
in my server {}
section I set multiple locations (to avoid a very big regexp):
location ~ ^/_matrix/client/(v2_alpha|r0)/sync$ {
proxy_pass http://synchrotron$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync$ {
proxy_pass http://synchrotron$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|r0)/initialSync$ {
proxy_pass http://synchrotron$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|v2_alpha|r0)/events$ {
proxy_pass http://synchrotron$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
reload the nginx config, and your synchrotron worker should start to get traffic.
synapse.app.federation_reader listen on port 8011
worker_app: synapse.app.federation_reader
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8011
resources:
- names: [federation]
worker_pid_file: "/var/run/app.federation_reader.pid"
worker_daemonize: False
worker_log_config: /etc/matrix-synapse/federation-reader-log.yaml
send_federation: False
Here I separated the ^/_matrix/federation/v1/send/
endpoint, since it's documented that this cannot be multiple
location ~ ^/_matrix/federation/v1/send/ {
proxy_pass http://127.0.0.1:8011$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
# and a big regex for the rest
location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) {
proxy_pass http://127.0.0.1:8011$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
I also tried media_repository
and event_creator
, but with it was not working as expected. For instance, the configs :
worker_app: synapse.app.event_creator
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8102
resources:
- names:
- client
worker_daemonize: False
worker_pid_file: /var/run/event_creator.pid
worker_log_config: /etc/matrix-synapse/event_creator-log.yaml
send_federation: False
worker_app: synapse.app.media_repository
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8101
resources:
- names:
- media
worker_daemonize: False
worker_pid_file: /var/run/media_repository.pid
worker_log_config: /etc/matrix-synapse/media_repository-log.yaml
send_federation: False
# events_creator
location ~ ^/_matrix/client/(api/v1|r0|unstable)(/rooms/.*/send|/rooms/.*/state/|/rooms/.*/(join|invite|leave|ban|unban|kick)$|/join/|/profile/) {
proxy_pass http://127.0.0.1:8102$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
# media_repository : XXX Breaks thumbnails
location ~ (^/_matrix/media/|^/_synapse/admin/v1/purge_media_cache$|^/_synapse/admin/v1/room/.*/media.*$|^/_synapse/admin/v1/user/.*/media.*$|^/_synapse/admin/v1/media/.*$|^/_synapse/admin/v1/quarantine_media/.*$) {
proxy_pass http://127.0.0.1:8101$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
This are the issues I met until now (it might also have been related to some big federated rooms):
- CPU usage was getting high on all synchrotron workers, more than with a single synapse process
- a lot of clients were disconnecting all the time
- some old notifications where popping up on desktop and mobile all the time
- media_repository was breaking thumbnails
-
send_federation: False
is needed in all workers configs except federation_sender (see #7130)