Outbox latency improvements #60

Bruno-DaSilva · 2018-05-02T22:59:06Z

This MR covers two main performance improvements:

Assume that if there was at least one record in the messages table last time, that there will be more this time. So, don't sleep and immediately try and get those message(s) from the db. If there were no messages in the table, sleep per usual until the next polling time.
This will mainly help when the table gets backed up with many messages, so we aren't waiting ~2 seconds to get the next set of data from the messages table.
To get around the delay associated with polling, I see only a few options.

decrease the polling time until it is within an 'acceptable' range of delay. This can get extremely expensive if you're polling the db many times per second.
somehow notify the MM Producer that there are more records and to check the messages table.
- Obviously, if we do this through http, then we will synchronously couple each producer to the service, and could easily overwhelm the single producer given enough requests. It would also probably still be too delayed
- If we do this through rabbit, while it would be fast it would also still have the same problem of dropped messages. It would be very probably that < 100% of records are within our 'acceptable delay'.
- If we do this through Postgres (as is implemented here), we get 100% guarantee of delivery, though at the cost of performance. There will be increased load on the db using Postgres' NOTIFY/LISTEN. However, assuming we only have one listener, it shouldn't affect performance too much. There aren't many benchmarks online on the performance impact of NOTIFY/LISTEN in Postgres, so we may want to benchmark it ourselves to see performance impact.

As is below, I implemented NOTIFY/LISTEN such that:

Every multiple_man_publish will now send a NOTIFY after saving to the multiple_man_messages table.
Listener thread will listen for those notifications. When the listener thread receives one, it will wake the producer from sleep if it's not running. If it's already running, it will do nothing.
Producer thread will behave as usual, polling the db every 2 seconds unless its sleep is interrupted.

…time

… record is added to the messages table, the producer is notified

Bruno-DaSilva · 2018-05-02T23:02:07Z

There is one concern I had in the way this is implemented, specifically around interrupting the Thread from sleeping. Is it safe to assume any other part of the code executed by the MM Producer will not have any sleep anywhere? I'm afraid we might inadvertently interrupt a sleep that is not the 2 second sleep. Say, if there was a sleep in bunny, for example.

Bruno-DaSilva · 2018-05-02T23:53:06Z

re: postgres NOTIFY/LISTEN benchmarks, I just ran some quick/dirty ones on a local postgres instance with rails console -> postgres. Since it was over localhost, network delay was close to 0ms.
From a single thread, I was able to get:

~11000 notifies per second to a channel with no listeners. This is probably IO bound? I didn't see any single component hit 100% CPU.
~9000 notifies per second to a channel with 1 listener, listener having no logic. Again, probably IO bound?
~6000 notifies per second to a channel with 1 listener (outbox producer) AND the producer SELECT'ing from the db each time. This was also on a 4 core machine that is definitely CPU bound at 100% CPU on all cores, since there's a lot of stuff happening in the rails app. This would be almost definitely higher if I had more cores.

Given that I don't think any of our services come even close to that 6000 notifies per second, I'm of the opinion that the performance is perfectly acceptable. That said, since NOTIFY works as more of a multicast than it does rabbitmq (for every incoming message, it sends a copy to ALL listeners, instead of round robin'ing the messages), this is not scalable once you hit that limit. There would be no way to horizontally scale this solution.

Given that it would have to be an incredible load to produce that many messages, is that even a worry?

bradrobertson · 2018-05-03T13:22:22Z

lib/multiple_man/outbox/message/rails.rb

@@ -14,6 +14,11 @@ def self.push_record(record, operation, options)
            set_name:    MultipleMan::RoutingKey.model_name(routing_key)
          ).save!
        end
+
+        after_save do


in case anyone else has the same thought, I just looked up this:

Firstly, if a NOTIFY is executed inside a transaction, the notify events are not delivered until and unless the transaction is committed. This is appropriate, since if the transaction is aborted, all the commands within it have had no effect, including NOTIFY.

bradrobertson · 2018-05-03T13:37:13Z

lib/multiple_man/producers/general.rb

@@ -11,6 +11,12 @@ def run_producer
        require_relative '../outbox/db'
        require_relative '../outbox/message/sequel'

+        @producer_thread = Thread.new { producer_loop }


I'm not a threading expert, but what happens if this producer_loop throws an exception? I'm just thinking about db connections, notifications etc

Bruno Da Silva added 2 commits May 2, 2018 15:40

mm producer: only sleep if there were 0 records in the database last …

2af29eb

…time

Add pub/sub notifications to the outbox implementation so, when a new…

3f455e5

… record is added to the messages table, the producer is notified

bradrobertson reviewed May 3, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outbox latency improvements #60

Outbox latency improvements #60

Bruno-DaSilva commented May 2, 2018 •

edited

Loading

Bruno-DaSilva commented May 2, 2018

Bruno-DaSilva commented May 2, 2018 •

edited

Loading

bradrobertson May 3, 2018

bradrobertson May 3, 2018

Outbox latency improvements #60

Are you sure you want to change the base?

Outbox latency improvements #60

Conversation

Bruno-DaSilva commented May 2, 2018 • edited Loading

Bruno-DaSilva commented May 2, 2018

Bruno-DaSilva commented May 2, 2018 • edited Loading

bradrobertson May 3, 2018

Choose a reason for hiding this comment

bradrobertson May 3, 2018

Choose a reason for hiding this comment

Bruno-DaSilva commented May 2, 2018 •

edited

Loading

Bruno-DaSilva commented May 2, 2018 •

edited

Loading