Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outbox latency improvements #60

Open
wants to merge 2 commits into
base: outbox
Choose a base branch
from
Open

Conversation

Bruno-DaSilva
Copy link
Contributor

@Bruno-DaSilva Bruno-DaSilva commented May 2, 2018

This MR covers two main performance improvements:

  1. Assume that if there was at least one record in the messages table last time, that there will be more this time. So, don't sleep and immediately try and get those message(s) from the db. If there were no messages in the table, sleep per usual until the next polling time.
    This will mainly help when the table gets backed up with many messages, so we aren't waiting ~2 seconds to get the next set of data from the messages table.

  2. To get around the delay associated with polling, I see only a few options.

  • decrease the polling time until it is within an 'acceptable' range of delay. This can get extremely expensive if you're polling the db many times per second.
  • somehow notify the MM Producer that there are more records and to check the messages table.
    • Obviously, if we do this through http, then we will synchronously couple each producer to the service, and could easily overwhelm the single producer given enough requests. It would also probably still be too delayed
    • If we do this through rabbit, while it would be fast it would also still have the same problem of dropped messages. It would be very probably that < 100% of records are within our 'acceptable delay'.
    • If we do this through Postgres (as is implemented here), we get 100% guarantee of delivery, though at the cost of performance. There will be increased load on the db using Postgres' NOTIFY/LISTEN. However, assuming we only have one listener, it shouldn't affect performance too much. There aren't many benchmarks online on the performance impact of NOTIFY/LISTEN in Postgres, so we may want to benchmark it ourselves to see performance impact.

As is below, I implemented NOTIFY/LISTEN such that:

  • Every multiple_man_publish will now send a NOTIFY after saving to the multiple_man_messages table.
  • Listener thread will listen for those notifications. When the listener thread receives one, it will wake the producer from sleep if it's not running. If it's already running, it will do nothing.
  • Producer thread will behave as usual, polling the db every 2 seconds unless its sleep is interrupted.

Bruno Da Silva added 2 commits May 2, 2018 15:40
@Bruno-DaSilva
Copy link
Contributor Author

There is one concern I had in the way this is implemented, specifically around interrupting the Thread from sleeping. Is it safe to assume any other part of the code executed by the MM Producer will not have any sleep anywhere? I'm afraid we might inadvertently interrupt a sleep that is not the 2 second sleep. Say, if there was a sleep in bunny, for example.

@Bruno-DaSilva
Copy link
Contributor Author

Bruno-DaSilva commented May 2, 2018

re: postgres NOTIFY/LISTEN benchmarks, I just ran some quick/dirty ones on a local postgres instance with rails console -> postgres. Since it was over localhost, network delay was close to 0ms.
From a single thread, I was able to get:

  • ~11000 notifies per second to a channel with no listeners. This is probably IO bound? I didn't see any single component hit 100% CPU.
  • ~9000 notifies per second to a channel with 1 listener, listener having no logic. Again, probably IO bound?
  • ~6000 notifies per second to a channel with 1 listener (outbox producer) AND the producer SELECT'ing from the db each time. This was also on a 4 core machine that is definitely CPU bound at 100% CPU on all cores, since there's a lot of stuff happening in the rails app. This would be almost definitely higher if I had more cores.

Given that I don't think any of our services come even close to that 6000 notifies per second, I'm of the opinion that the performance is perfectly acceptable. That said, since NOTIFY works as more of a multicast than it does rabbitmq (for every incoming message, it sends a copy to ALL listeners, instead of round robin'ing the messages), this is not scalable once you hit that limit. There would be no way to horizontally scale this solution.

Given that it would have to be an incredible load to produce that many messages, is that even a worry?

@@ -14,6 +14,11 @@ def self.push_record(record, operation, options)
set_name: MultipleMan::RoutingKey.model_name(routing_key)
).save!
end

after_save do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case anyone else has the same thought, I just looked up this:

Firstly, if a NOTIFY is executed inside a transaction, the notify events are not delivered until and unless the transaction is committed. This is appropriate, since if the transaction is aborted, all the commands within it have had no effect, including NOTIFY.

@@ -11,6 +11,12 @@ def run_producer
require_relative '../outbox/db'
require_relative '../outbox/message/sequel'

@producer_thread = Thread.new { producer_loop }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a threading expert, but what happens if this producer_loop throws an exception? I'm just thinking about db connections, notifications etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants