Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Validate cached attributes with
==
comparison (#745)
According to [this diss post](https://lobste.rs/s/ily9nh/you_should_use_ruby_on_rails_logger_block#c_l3pkzi), Ruby’s implementation of [SipHash](https://en.wikipedia.org/wiki/SipHash) for Hash hashes is not adequately collision resistant to be used as a cache key. > The problem is that this hash code is only meant for hash tables, hence it uses [SipHash](https://en.wikipedia.org/wiki/SipHash), so you are not meant to use it as sole key as it’s susceptible to collisions. When two hash codes match, you are supposed to additionally compare the original values that produced the hash codes to handle hash collisions. This code doesn’t do it. It assumes two objects with the same hash code are identical. This PR updates the implementation of `Phlex::FIFO` to store a copy of the original hash and add an `==` comparison on read. The performance cost is ~20% on our benchmark. **Addressing some of the other issues raised in the diss post:** > Then this cache is a synchronized FIFO with a fixed 4MiB size and no way to resize it. The FIFO cache has always taken a max_bytesize argument. This wasn’t exposed to users as a configuration option yet because: 1. we’ve never had any configuration for Phlex so we'd need to figure out the best way to do that 2. there's a good chance we might be able to automatically configure it, or at least have it so that it caches 99% of your static attributes but doesn't run away continuously growing if you’ve used dynamic attributes 3. the FIFO cache is a 2.0 feature and hasn’t been in any released versions of Phlex > FIFO is bad here because this cache is here to not have to generate the HTML for static calls, e.g. h1 class: "foo", but since the size is fixed, and you are constantly querying it with dynamic data, you are evicting the actually useful keys. So it should be a LRU or similar. Actually, FIFO isn’t bad here. LRU is _terrible_ for read-heavy caches because each read becomes a _very expensive_ write. I guess you might be able to get reasonably good LRU read performance by using a b-tree, but that's significantly more complicated and will never be as fast as an O(1) FIFO read. Phlex’ cache is read-heavy so read performance is key. LIFO (or essentially just putting a limit on writing to new values) would also work pretty well, but FIFO gives you the best chance of relevant keys being in the cache, even when some of those values are dynamic. > And finally, while it doesn’t really matter on MRI because of the GVL, on Ruby implementations with free threading, you are constantly contending on a global mutex whenever you need to generate an HTML tag. Again it works, and for most users it will never be a problem, but can’t be qualified as “high quality”. Actually, the FIFO cache only uses the mutex to serialise writes so this makes no sense.
- Loading branch information