Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Workers setup with nginx

airblag edited this page Mar 27, 2020 · 8 revisions

WIP

The actual documentation for setting up workers is not really easy to follow :

This is how I (try) to change my setup for using workers. WARNING : SHOULD BE REVIEWED ! WIP ! Actually this breaks my setup in a lot of strange ways Look at the issues below first

I expect you have already a working synapse configuration. Not putting whole config files here

Background

  • My setup is having around 400 users. mostly around 300 concurrent connections on day time. 4500 local rooms. Some big federated rooms too.
  • Server is running in a VMware with 16 CPU and 32GB RAM (half of it for postgreSQL).
  • DB is 14GB big
  • nginx is used as a reverse proxy
  • Synapse homeserver process is hammering with 100-120%CPU all day long, but never uses more of the CPUs.
  • my nginx graph gives an average of 140 requests/s in working hours
  • I'm using the debian packages of matrix.org and starting matrix with systemd

Which workers are meaningful ?

analysing old logs

First, I wanted to check what endpoints are asked the most in my installation. I grepped the endpoints of every worker as described in https://github.com/matrix-org/synapse/blob/master/docs/workers.md in my nginx access log for 24 hours:

synapse.app.synchrotron

grep -E '(/_matrix/client/(v2_alpha|r0)/sync|/_matrix/client/(api/v1|v2_alpha|r0)/events|/_matrix/client/(api/v1|r0)/initialSync|/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync)' |wc -l

synapse.app.federation_reader

grep -E '(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised|/_matrix/key/v2/query|/_matrix/federation/v1/groups/)'

synapse.app.media_repository

grep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)' root@mort:/var/log/nginx# zgrep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'

synapse.app.client_reader

grep -E '(/_matrix/client/(api/v1|r0|unstable)/publicRooms|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/joined_members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/context/.*|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state|/_matrix/client/(api/v1|r0|unstable)/login|/_matrix/client/(api/v1|r0|unstable)/account/3pid|/_matrix/client/(api/v1|r0|unstable)/keys/query|/_matrix/client/(api/v1|r0|unstable)/keys/changes|/_matrix/client/versions|/_matrix/client/(api/v1|r0|unstable)/voip/turnServer|/_matrix/client/(api/v1|r0|unstable)/joined_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups/|/_matrix/client/(api/v1|r0|unstable)/pushrules/.*|/_matrix/client/(api/v1|r0|unstable)/groups/.*|/_matrix/client/(r0|unstable)/register|/_matrix/client/(r0|unstable)/auth/.*/fallback/web)'

Note : I didn't included /_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages) 175576 (without /messages) 9998816 (with /messages not sure why)

synapse.app.user_dir

grep -E '/_matrix/client/(api/v1|r0|unstable)/user_directory/search'

synapse.app.frontend_proxy

grep -E '/_matrix/client/(api/v1|r0|unstable)/keys/upload'

synapse.app.event_creator

grep -E '(/_matrix/client/(api/v1|r0|unstable)/rooms/.*/send|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state/|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)|/_matrix/client/(api/v1|r0|unstable)/join/|/_matrix/client/(api/v1|r0|unstable)/profile/)'

results

worker’s endpoints request/day percent
synchrotron 9017921 90.19%
federation_reader 321413 3.21%
media_repository 115749 1.16%
client_reader 175576 1.76%
user_dir 1341 0.01%
frontend_proxy 6936 0.07%
event_creator 26876 0.27%
total 9665812 96.67%
total requests 9998816 100.00%
others 333004 3.33%

So the synchrotron would make the most of sense for me (since I think my setup is standard, I guess it's almost always like this)

Setting up synchrotron worker(s)

WARNING : I broke parts of my setup a lot while trying to do it on a live server.

homeserver.yaml

Just add this in the existing listeners part of the config

listeners:
  # The TCP replication port
  - port: 9092
    bind_address: '127.0.0.1'
    type: replication
  # The HTTP replication port
  - port: 9093
    bind_address: '127.0.0.1'
    type: http
    resources:
     - names: [replication]

Also hat this to homeserver.yaml

worker_app: synapse.app.homeserver daemonize: false

restart your synapse to check it's still working

# systemctl restart matrix-synapse

workers configuration

Note : if you work as root, take care of giving the config files to matrix-synapse user after creating them

I used the systemd instructions from here https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers. But I changed it to be able to start multiple synchrotron workers.

mkdir /etc/matrix-synapse/workers

/etc/matrix-synapse/workers/synchrotron-1.yaml

worker_app: synapse.app.synchrotron

# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093

worker_listeners:
 - type: http
   port: 8083
   resources:
     - names:
       - client

worker_daemonize: False
worker_pid_file: /var/run/synchrotron1.pid
worker_log_config: /etc/matrix-synapse/synchrotron1-log.yaml
send_federation: False

If you want to run multiple synchrotron, create other config like this sed -e 's/synchrotron1/sychrotron2/g' -e 's/8083/8084' /etc/matrix-synapse/workers/synchrotron1.yaml > /etc/matrix-synapse/workers/synchrotron2.yaml

Don't forget to create log config files as weel for each worker.

/etc/matrix-synapse/synchrotron1-log.yaml

This process should produce the logfile /var/log/matrix-synapse/synchrotron1.log It may possibly be reduced...

version: 1

formatters:
  precise:
   format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s'                                                                                        

filters:
  context:
    (): synapse.util.logcontext.LoggingContextFilter
    request: ""

handlers:
  file:
    class: logging.handlers.RotatingFileHandler
    formatter: precise
    filename: /var/log/matrix-synapse/synchrotron1.log
    maxBytes: 104857600
    backupCount: 10
    filters: [context]
    encoding: utf8
    level: DEBUG
  console:
    class: logging.StreamHandler
    formatter: precise
    level: WARN

loggers:
    synapse:
        level: WARN

    synapse.storage.SQL:
        level: INFO

    synapse.app.synchrotron:
            level: DEBUG
root:
    level: WARN
    handlers: [file, console]

Starting the worker

I tried to start the worker with synctl but I had to change the config to include /etc/matrix-synapse/conf.d/* in it cause it wasn't reading them. Since I use systemd to start it in production, it's better to set up workers to start with systemd directly

systemd

Followed this : https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers

And created an extra systemd service to be able to have multiple synchrotrons.

[Unit]
Description=Synapse Matrix Worker
After=matrix-synapse.service
BindsTo=matrix-synapse.service

[Service]
Type=notify
NotifyAccess=main
User=matrix-synapse
WorkingDirectory=/var/lib/matrix-synapse
EnvironmentFile=/etc/default/matrix-synapse
ExecStart=/opt/venvs/matrix-synapse/bin/python -m synapse.app.synchrotron --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/synchrotron-%i.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
SyslogIdentifier=matrix-synapse-synchrotron-%i

[Install]
WantedBy=matrix-synapse.service
  • Reload the systemd config : systemctl daemon-reload
  • start synchrotron1 : systemctl start matrix-synapse-worker-synchrotron@1.service
  • check the logs : journal -xe -f -u matrix-synapse-worker-synchrotron@1.service

If this worked, you should have now an extra python process for synchrotron1. But it doesn't handle any traffic yet.

Nginx config

Some extras

add this to your default_server somewhere in server { }

        location /nginx_status {
                stub_status on;
                access_log   off;
                allow 127.0.0.1;
                allow ::1;
                deny all;
        }

you can then get some ideas of the requests you get with

$ curl http://127.0.0.1/nginx_status
Active connections: 270 
server accepts handled requests
 172758 172758 3500311 
Reading: 0 Writing: 126 Waiting: 144 

upstream synchrotrons

First, I set up a pool for the synchrotrons (look at the ports configured in the workers). This way, I could theoricaly scale out when there is too much load. I also added a log format to be able to trace in nginx which worker is handling which request (stolen somewhere I don't remember) :

Place this in your nginx config (I put it in my vhost config outside of server {})

log_format backend '$remote_addr - $remote_user - [$time_local] $upstream_addr: $request $status URT:$upstream_response_time request_time $request_time';

upstream synchrotron {
#               ip_hash; # this might help in some cases, not in mine
#               server 127.0.0.1:8008; # main synapse process, to roll back when it goes wrong (reacted strangely)
               server 127.0.0.1:8083; # synchrotron1
#               server 127.0.0.1:8084; # synchrotron2
#               server 127.0.0.1:8085; # synchrotron3
}

Then, you can change the default log format of your vhost :

server {
#[...]
       access_log /var/log/nginx/matrix-access.log backend;
#[...]
}

reverse proxy the endpoints

in my server {} section I set multiple locations (to avoid a very big regexp):

       location ~ ^/_matrix/client/(v2_alpha|r0)/sync$ {
                       proxy_pass http://synchrotron$uri;
                       proxy_set_header X-Forwarded-For $remote_addr;
                       proxy_set_header Host $host;
       }  
       location ~ ^/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync$ {
               proxy_pass http://synchrotron$uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }
       location ~ ^/_matrix/client/(api/v1|r0)/initialSync$ {
               proxy_pass http://synchrotron$uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }
       location ~ ^/_matrix/client/(api/v1|v2_alpha|r0)/events$ {
               proxy_pass http://synchrotron$uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }

reload the nginx config, and your synchrotron worker should start to get traffic.

federation_reader

workers/federation_reader.yaml

synapse.app.federation_reader listen on port 8011

worker_app: synapse.app.federation_reader

worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093

worker_listeners:
    - type: http
      port: 8011
      resources:
          - names: [federation]

            
worker_pid_file: "/var/run/app.federation_reader.pid"
worker_daemonize: False
worker_log_config: /etc/matrix-synapse/federation-reader-log.yaml
send_federation: False

Here I separated the ^/_matrix/federation/v1/send/ endpoint, since it's documented that this cannot be multiple

        location ~ ^/_matrix/federation/v1/send/ {
                proxy_pass http://127.0.0.1:8011$uri;
                proxy_set_header X-Forwarded-For $remote_addr;
                proxy_set_header Host $host;
        }
# and a big regex for the rest
        location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) {
                proxy_pass http://127.0.0.1:8011$uri;
                proxy_set_header X-Forwarded-For $remote_addr;
                proxy_set_header Host $host;
        }

other workers

I also tried media_repository and event_creator, but with it was not working as expected. For instance, the configs :

event_creator.yaml

worker_app: synapse.app.event_creator

# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093

worker_listeners:
 - type: http
   port: 8102
   resources:
     - names:
       - client

worker_daemonize: False
worker_pid_file: /var/run/event_creator.pid
worker_log_config: /etc/matrix-synapse/event_creator-log.yaml
send_federation: False

media_repository.yaml

worker_app: synapse.app.media_repository

# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093

worker_listeners:
 - type: http
   port: 8101
   resources:
     - names:
       - media

worker_daemonize: False
worker_pid_file: /var/run/media_repository.pid
worker_log_config: /etc/matrix-synapse/media_repository-log.yaml
send_federation: False

in nginx

# events_creator
       location ~ ^/_matrix/client/(api/v1|r0|unstable)(/rooms/.*/send|/rooms/.*/state/|/rooms/.*/(join|invite|leave|ban|unban|kick)$|/join/|/profile/) {
               proxy_pass http://127.0.0.1:8102$uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }

# media_repository : XXX Breaks thumbnails
       location ~ (^/_matrix/media/|^/_synapse/admin/v1/purge_media_cache$|^/_synapse/admin/v1/room/.*/media.*$|^/_synapse/admin/v1/user/.*/media.*$|^/_synapse/admin/v1/media/.*$|^/_synapse/admin/v1/quarantine_media/.*$) {
               proxy_pass http://127.0.0.1:8101$uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }

Issues

This are the issues I met until now (it might also have been related to some big federated rooms):

  • CPU usage was getting high on all synchrotron workers, more than with a single synapse process
  • a lot of clients were disconnecting all the time
  • some old notifications where popping up on desktop and mobile all the time
  • media_repository was breaking thumbnails
  • send_federation: False is needed in all workers configs except federation_sender (see #7130)