Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Design Doc: HTML Caching plus PageSpeed

Jeff Kaufman edited this page Jan 9, 2017 · 1 revision

HTML Caching + PageSpeed: Use cases

Ilya Grigorik, 2013-04-10

Many sites rely on a caching layer (e.g, Varnish, Squid, built-in Nginx cache, or 3rd party CDN provider) to serve both static assets and HTML, either all the time or only when they get a spike in traffic. The primary motivation is to increase throughput and to shield the (potentially slow or loaded) application servers from having to handle each request. Some examples include:

  • Wikipedia caches all content in Squid and Varnish clusters
  • News sites often cache all content at the edge (CDN), including HTML
  • Hosting providers often serve all WordPress pages out of upstream cache

Many of these providers run their app servers at high load: even with the caching layer, the CPU load and throughput is a concern. This means that non-cacheable HTML is a non starter, as it would bypass their existing cache layer and place much higher load on their origins.

Once the HTML is cached, two strategies are used for updates:

  • the cache intervals for HTML are kept relatively short (e.g, 60s for a news site)
  • the page is cached by upstream cache but served as “non-cacheable” to client (Wikipedia)
  • a cache purge API is used to invalidate cached resource on update (e.g, article update)

Long story short, the HTML document must be cacheable, which creates several different deployment cases we need to support with PageSpeed...

PageSpeed “at the edge”

[ app server ] <---> [ pagespeed ] <---> [visitor]

This is the simplest, base scenario and one we (mostly) support today. The rough workflow is:

  • backend handler (e.g, php, java, remote server, etc) generates the bytes
  • pagespeed rewrites bytes and marks doc as non-cacheable
    • every HTML request is routed to the backend handler
    • every HTML request is served by PageSpeed

(A) We need to enable PageSpeed to cache HTML generated by the backend, to protect the backend from having to service every HTML request.

If we can cache HTML within PageSpeed, we shield the backend, and we still have the ability to perform all UA-specific optimizations, because all HTML requests are still served through PageSpeed (emitted HTML is still served with Cache-Control: private).

Note that this case covers both single server and “pagespeed as a service” use cases - e.g., a CDN offering PageSpeed optimization at the edge (PageSpeed Service, EdgeCast, etc).

However, now that the HTML is cached by PageSpeed, (B) we also need a way to invalidate it on a per resource basis (aka, Purge API) - - we can’t rely on just TTL’s. Example:

PURGE /article.html
Host: domain.com

Purge API applies to all resources, not just HTML. Example: user updates CSS stylesheet in built-in Wordpress editor, and WP automatically triggers a background purge request to the upstream cache when the file is saved. This allows cacheable resources + instant updates, and is a common workflow - e.g., ecosystem of existing plugins for Varnish/Squid/nginx purge from Wordpress, Drupal, etc.

Example of Wikipedia

Wikipedia illustrates another interesting deployment pattern:

  • backend server returns a “cacheable” response to cache
  • cache stores the data, but strips the cache header, serving it as non-cacheable to user

This setup protects the backend from servicing every request, but still has benefit of instant purge for every client. For an example of this in action, check any Wikipedia page:

HTTP/1.0 200 OK
Content-Length: 35180
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Date: Wed, 10 Apr 2013 06:09:11 GMT
Server: Apache
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
X-Cache: HIT from cp1010.eqiad.wmnet
X-Cache-Lookup: HIT from cp1010.eqiad.wmnet:3128

If we have (A) and (B) then this pattern is “inherited for free”.. since it’s a hybrid of what we do currently with above features.

Finally with (A) and (B) in place, site owner could now also decide to make the optimized HTML publicly cacheable, which can be a fair compromise between load and update speed:

  • No extra PageSpeed smarts + cached HTML: site owner can’t issue a purge, and must wait for client-cached doc to expire to propagate new updates. In practice, many news sites uses this with short (60s-range) lifetimes, which is still “fast enough” and simple enough to deploy.

PageSpeed as origin + upstream cache

[ app server ] <---> [ pagespeed ] <---> [CDN or cache] <---> [visitor]

PageSpeed at the edge is not always a viable solution: an existing caching layer is present (e.g., Varnish, Squid, etc), or a CDN is used to cache static and HTML assets to help with QPS and latency -- this is a very frequent pattern for larger media sites.

This deployment pattern create several problems because the upstream cache is “not as smart” as PageSpeed cache:

  1. Partial rewriting race condition between PageSpeed and upstream cache
  2. UA specific optimization must be performed by upstream cache
  3. HTML is non-cacheable by upstream cache (by default)

Today, we allow for static resources to be cached in upstream cache, but the HTML resources must be routed to PageSpeed, which is too costly for many origins. They need all resources to be cacheable. To achieve this, there are two routes:

  • (a) try to replicate similar “PageSpeed smarts” in the upstream cache
  • (b) keep it simple, drop some features to support a “dumb” upstream cache

Because many upstream caches have limited configuration capability (ex, public CDN, or infrastructure you can’t really touch), (b) is the more likely case for many larger sites. Hence:

  1. We can’t replicate the UA sniffing logic > must give up UA specific optimization
    • Drop WebP, inlining, localStorage
  2. With (1) in place, HTML is safe for all UA’s and can be cached upstream.
    • The purge/invalidation problem is now in both places, and client will have to first purge a resources from PageSpeed, and then also issue a purge to their upstream cache.
  3. Partial rewriting..
  • We could force blocking rewrites - potentially costly, but simple.
  • Return partial response with no-cache, forcing a bypass of upstream cache, then serve rewritten resource with cache. This means some requests will make it all the way down to the origin server, but hopefully these “uncached” windows are small (and can be further limited by dogpile protection in the cache).

(C) no new features here, assuming (A) and (B), just a matter of clear documentation on which filters to disable, and how to handle partial rewrites.

Finally, if we go the down the “make the upstream cache smarter”, then:

  1. Upstream cache can be configured to perform similar UA sniffing logic as PageSpeed
    • e.g., WebP sniffing logic ported to varnish and nginx
  2. With (1) in place, same UA sniffing logic could be replicated for serving optimized HTML
  3. Same considerations for partial rewriting...

This last case is the most complicated and involved. It would require a lot of handholding, and is ultimately fairly brittle -- any changes in PageSpeed logic would have to be replicated in upstream cache logic also. In other words, it’s doable, but not the top priority -- you’re likely much better off moving PageSpeed to the edge.

In summary...

  • (A) We need to enable PageSpeed to cache HTML generated by the backend, to protect the backend from having to service every HTML request.
  • (B) We need a way to invalidate it on a per resource basis (aka, Purge API)
  • (C) We need to documentation on which filters to disable, and how to handle partial rewrites in presence of a “dumb” upstream cache.

References

Clone this wiki locally