Skip to content

Commit

Permalink
Validate cached attributes with == comparison (#745)
Browse files Browse the repository at this point in the history
According to [this diss
post](https://lobste.rs/s/ily9nh/you_should_use_ruby_on_rails_logger_block#c_l3pkzi),
Ruby’s implementation of
[SipHash](https://en.wikipedia.org/wiki/SipHash) for Hash hashes is not
adequately collision resistant to be used as a cache key.

> The problem is that this hash code is only meant for hash tables,
hence it uses [SipHash](https://en.wikipedia.org/wiki/SipHash), so you
are not meant to use it as sole key as it’s susceptible to collisions.
When two hash codes match, you are supposed to additionally compare the
original values that produced the hash codes to handle hash collisions.
This code doesn’t do it. It assumes two objects with the same hash code
are identical.

This PR updates the implementation of `Phlex::FIFO` to store a copy of
the original hash and add an `==` comparison on read. The performance
cost is ~20% on our benchmark.

**Addressing some of the other issues raised in the diss post:**

> Then this cache is a synchronized FIFO with a fixed 4MiB size and no
way to resize it.

The FIFO cache has always taken a max_bytesize argument. This wasn’t
exposed to users as a configuration option yet because:

1. we’ve never had any configuration for Phlex so we'd need to figure
out the best way to do that
2. there's a good chance we might be able to automatically configure it,
or at least have it so that it caches 99% of your static attributes but
doesn't run away continuously growing if you’ve used dynamic attributes
3. the FIFO cache is a 2.0 feature and hasn’t been in any released
versions of Phlex

> FIFO is bad here because this cache is here to not have to generate
the HTML for static calls, e.g. h1 class: "foo", but since the size is
fixed, and you are constantly querying it with dynamic data, you are
evicting the actually useful keys. So it should be a LRU or similar.

Actually, FIFO isn’t bad here. LRU is _terrible_ for read-heavy caches
because each read becomes a _very expensive_ write. I guess you might be
able to get reasonably good LRU read performance by using a b-tree, but
that's significantly more complicated and will never be as fast as an
O(1) FIFO read.

Phlex’ cache is read-heavy so read performance is key. LIFO (or
essentially just putting a limit on writing to new values) would also
work pretty well, but FIFO gives you the best chance of relevant keys
being in the cache, even when some of those values are dynamic.

> And finally, while it doesn’t really matter on MRI because of the GVL,
on Ruby implementations with free threading, you are constantly
contending on a global mutex whenever you need to generate an HTML tag.
Again it works, and for most users it will never be a problem, but can’t
be qualified as “high quality”.

Actually, the FIFO cache only uses the mutex to serialise writes so this
makes no sense.
  • Loading branch information
joeldrapper authored Aug 12, 2024
2 parents 37700b2 + 6ac15fd commit a5b35f5
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 27 deletions.
2 changes: 1 addition & 1 deletion bench.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@
puts sample.bytesize

Benchmark.ips do |x|
x.time = 20
x.time = 5
x.report("Page") { Example::Page.new.call }
end
2 changes: 1 addition & 1 deletion lib/phlex.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ module Phlex
autoload :Unbuffered, "phlex/unbuffered"

Escape = ERB::Escape
ATTRIBUTE_CACHE = FIFO.new(4_000_000) # 4MB
ATTRIBUTE_CACHE = FIFO.new
SUPPORTS_FIBER_STORAGE = RUBY_ENGINE == "ruby"
end

Expand Down
6 changes: 3 additions & 3 deletions lib/phlex/elements.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ def #{method_name}(**attributes, &block)
if attributes.length > 0 # with attributes
if block # with content block
buffer << "<#{tag}" << (Phlex::ATTRIBUTE_CACHE[attributes.hash] ||= __attributes__(attributes)) << ">"
buffer << "<#{tag}" << (Phlex::ATTRIBUTE_CACHE[attributes] ||= __attributes__(attributes)) << ">"
yield_content(&block)
buffer << "</#{tag}>"
else # without content block
buffer << "<#{tag}" << (Phlex::ATTRIBUTE_CACHE[attributes.hash] ||= __attributes__(attributes)) << "></#{tag}>"
buffer << "<#{tag}" << (Phlex::ATTRIBUTE_CACHE[attributes] ||= __attributes__(attributes)) << "></#{tag}>"
end
else # without attributes
if block # with content block
Expand Down Expand Up @@ -113,7 +113,7 @@ def #{method_name}(**attributes)
end
if attributes.length > 0 # with attributes
buffer << "<#{tag}" << (Phlex::ATTRIBUTE_CACHE[attributes.hash] ||= __attributes__(attributes)) << ">"
buffer << "<#{tag}" << (Phlex::ATTRIBUTE_CACHE[attributes] ||= __attributes__(attributes)) << ">"
else # without attributes
buffer << "<#{tag}>"
end
Expand Down
30 changes: 8 additions & 22 deletions lib/phlex/fifo.rb
Original file line number Diff line number Diff line change
@@ -1,35 +1,21 @@
# frozen_string_literal: true

class Phlex::FIFO
def initialize(max_bytesize)
def initialize(max_size = 1_000_000)
@hash = {}
@mutex = Mutex.new

@bytesize = 0
@max_bytesize = max_bytesize
@max_size = max_size
end

attr_reader :bytesize, :max_bytesize
attr_reader :size, :max_size

def [](key)
@hash[key]
k, v = @hash[key.hash]
v if k == key
end

def []=(key, value)
@mutex.synchronize do
old_value = @hash.delete(key)
@hash[key] = value

if old_value
@bytesize += value.bytesize - old_value.bytesize
else
@bytesize += value.bytesize
end

while @bytesize > @max_bytesize
key, value = @hash.shift
@bytesize -= value.bytesize
end
end
hash = key.hash
@hash[hash] = [key, value]
@hash.shift while @hash.length > @max_size
end
end

0 comments on commit a5b35f5

Please sign in to comment.