Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Design Doc: Below the Fold Beaconing for Split HTML

Jeff Kaufman edited this page Jan 9, 2017 · 1 revision

Below the Fold Beaconing for Split HTML [DRAFT]

Jeff Kaufman, Jud Porter, 2013-06-25

The Split Html Rewriter needs to know what portion of the html is below the fold so it can deprioritize it. It currently uses a headless browser to determine this, but that isn't practical in open source. To support this rewriter in open source without requiring site owners to manually specify what portions of their html are below the fold we need to use javascript beacons. We currently use a similar approach with critical css and critical images, and we'd like to keep these implementations as similar as possible.

The javascript that determines what portions of the page are below the fold is already written and working in the headless browser extension, so we won't be starting from scratch. It currently reports beacons back in the form [start-xpath : end-xpath], where the start and end xpaths share a parent. For reasons explained below, we'll need to modify it to only report xpaths for individual nodes, not ranges.

We're planning to store the beacon responses in the property cache using the same support number approach used for critical css and images. Each individual xpath reported as below the fold will go into a table, initially with a support value of one. Every additional beacon listing that xpath as below the fold will increase its support value, and beacons not listing it will cause the value to decay. Consider html like:

<div id="a">
  <div id="b">...</div>
  <div id="c">
    <div>...</div>
    <div id="d">...</div>
    <div>
      <div id="left">...</div>
      <div id="right">...</div>
    </div>
    <div>...</div>
  </div>
</div>

To make this example illustrate several issues, imagine the CSS is a little unusual and specifies that the 2nd child of 'c', which is the div with id 'd', should display after all the other children of 'c':

+--a---------------------------+
| +--b-----------------------+ |
| | ...                      | |
| +--------------------------+ |
|                              |
| +--c-----------------------+ |
| | +--1st-----------------+ | |
| | | ...                  | | |
| | +----------------------+ | |
| |                          | |
| | +--3rd-----------------+ | |
| | | +--left--+ +-right-+ | | |
| | | | ...    | | ...   | | | |
| | | +--------+ +-------+ | | |
| | +----------------------+ | |
| |                          | |
| | +--4th-----------------+ | |
| | | ...                  | | |
| | +----------------------+ | |
| |                          | |
| | +--d-----------------+ | | |
| | | ...                  | | |
| | +----------------------+ | |
| +--------------------------+ |
+------------------------------+

Different browsers with different screen sizes will report different elements as being below

the fold. Imagine 1/4th of the traffic is desktop and sees almost all the page, so only the 2nd child of 'c' ('d') is below the fold. Another 1/4th have large tablets and so see the 2nd and 4th as below the fold. Yet another 1/4th have small tablets and see 2nd, 3rd, and 4th as below the fold. The remaining 1/4th have phones and see only 'b' with all four children of 'c' below the fold. Those beacon responses, respectively, would be:

case 1: div[id=d]
case 2: div[id=d], div[id=c]/div[4]
case 3: div[id=d], div[id=c]/div[4], div[id=c]/div[3]
case 4: div[id=c]

Note that in case 4 we have div[id=c] and not a longer version of case 3, because if a parent and all its children are below the fold we just report the parent. This is the same reason that 'left' and 'right' are never reported: they are below the fold only when 'div[id=c]/div[3]' is below the fold, and so there's no case when they'll get reported on their own. After 100 beacon responses, 1/4 from each case, our table would look like, ignoring decay:

xpath             support value
----------------  -------------
div[id=c]         25
div[id=c]/div[3]  25
div[id=c]/div[4]  50
div[id=d]         75

While every browser that saw 'div[id=c]' as below the fold also saw 'div[id=d]' that way, in the beacon handler we don't know anything about html structure and don't try to combine these observations. This both makes the code simpler and will let us reuse more of the existing critical css and critical images beacon processing.

In the html filter, when we're interpreting the beacon results to decide whether to split, we will have all the information necessary to interpret the beacon results, including the knowledge that 'c' being below the fold implies that all its children are also below the fold. For each element in the html we can determine the xpath that would be used to report it as well as the xpaths for all parent elements, and we can look up each one in the table of beacon responses from the property cache. If the sum of the support values for this xpath and all parent xpaths is over a threshold, we treat the element as below the fold.

For example, if our threshold were 60 and we had the table above, then we would decide:

  • div[id=d] is below the fold even without having to check its parent (75 > 60)
  • div[id=c]/div[4] is below the fold once we consider the parent (50 + 25 > 60)
  • div[id=c]/div[3] and div[id=c] are not below the fold (25 + 25 < 60, 25 < 60)

The current javascript with the headless browser is a little more flexible, in that it allows xpaths that define ranges. For example, case 4 above could be considered as [div[id=c]/div[2], div[id=c]/div[4]]. This is a kind of compression, and it helps most in cases where there's a long list of elements sharing the same parent that gets cut off partway through. To interpret a response like this, however, the lookup in the html filter would become much more complicated. Allowing those responses, a subset of our beacon table might look like:

xpath                                       support value
------------------------------------------  -------------
...
div[id=a]/div[3] through div[id=a]/div[10]  6
div[id=a]/div[4] through div[id=a]/div[10]  8
div[id=a]/div[6] through div[id=a]/div[10]  12
div[id=a]/div[7] through div[id=a]/div[10]  44
div[id=a]/div[3] through div[id=a]/div[9]   18
div[id=a]/div[4] through div[id=a]/div[9]   2
div[id=a]/div[5] through div[id=a]/div[9]   5
...

Let's say we're considering whether div[id=a]/div[9] is below the fold. To identify all relevant beacon responses we need to iterate over all keys in the table. It's no longer enough to look up div[id=a]/div[9], div[id=a], and ancestor elements. This moves us from O(1) in the size of the table to O(n). To fix this we could create a more complex data structure, with a hash of parent xpaths pointing to a list of xpath ranges and their accompanying support values. This would be O(1) in the size of the table and O(n) in the number of observed ranges. In addition to adding complexity, this also would mean we couldn't reuse the infrastructure that supports critical images and critical css beaconing.

Unless this is necessary for good performance or sufficiently small beacons, we're inclined to skip it for now.

Open questions:

  • What's the performance impact of running this javascript in clients, especially given that we need to run it before loading any other javascript?

  • Do we have to depend on defer_javascript, or is it enough to document that we must run before any dom-manipulating javascript?

  • How will we deal with cases where the javascript sees a different structure than pagespeed does, because the browser's parsing inserts nodes like

    and destructively resolves invalid html like <table><tr><td>foo<tr>bar</table>? What if browsers do different things in these cases?
    • This is also an issue for the headless browser below the fold determination flow.
Clone this wiki locally