Add expect-no-linked-resources Document-Policy to Speculative parsing #10718

alexnj · 2024-10-23T02:13:45Z

User Agents have implemented speculative parsing of HTML to speculatively fetch resources that are present in the HTML markup, to speed up page loading. For the vast majority of pages on the Web that have resources declared in the HTML markup, the optimization is beneficial and the cost paid in determining such resources is a sound tradeoff. However, the following scenarios might result in a sub-optimal performance tradeoff vs. the explicit time spent parsing HTML for determining sub resources to fetch:

Pages that do not have any resources declared in the HTML.
Large HTML pages with minimal or no resource loads that could explicitly control preloading resources via other preload mechanisms available.

This proposal introduces a configuration point in Document Policy by the name expect-no-linked-resources to hint to a User Agent that it may choose to optimize out the time spent in such sub resource determination.

Read the complete Explainer and spec changes proposed that covers the changes in this PR.

At least two implementers are interested (and none opposed):
- Chromium
- …
Tests are written and can be reviewed and commented upon at:
- Pending / in progress
Implementation bugs are filed:
- Chromium: https://issues.chromium.org/issues/365632977
- Gecko: …
- WebKit: …
- Deno: N/A
- Node.js: N/A
MDN issue is filed: …
The top of this comment includes a clear commit message to use.

/common-dom-interfaces.html ( diff )
/common-microsyntaxes.html ( diff )
/dom.html ( diff )
/index.html ( diff )
/infrastructure.html ( diff )
/parsing.html ( diff )
/references.html ( diff )
/structured-data.html ( diff )
/urls-and-fetching.html ( diff )

…#1) * Add expect-no-linked-resources Document-Policy to Speculative HTML parsing

domenic

I pushed an extra commit with small fixes I noticed during a final review.

Otherwise, this editorially LGTM.

I know this was discussed at the WebPerf WG and there was some general support from multiple implementers. And you're working on standards positions now. But if any implementers want to comment here, that'd be very welcome! I'll tag this as agenda+ to get some attention from the WHATNOT meeting crowd.

It's also noteworthy that this is the first document policy feature in HTML, and Document Policy itself is not yet integrated into HTML: https://wicg.github.io/document-policy/#integration-with-html . If this feature gets multi-implementer interest, than we should work on doing that integration sooner rather than later. /cc @clelland

smaug---- · 2024-10-24T17:06:41Z

Isn't this basically hacking around some implementation limitations in blink (and maybe in webkit)? Gecko doesn't in general need to do separate speculation passes.

alexnj · 2024-11-07T02:53:12Z

If the speculative parsing step isn't conducted as a separate step, there might trivial to no benefit for Gecko I'd imagine. If the documented implementation behavior still holds true, there might be a non-trivial benefit to Gecko for the speculation failure scenario (but only in cases where the web dev is able to hint to Gecko).

The HTML spec doesn't demand whether speculation should be conducted as a separate step, or inline with document parsing. The language of the spec seems written in a way that the implementation of parser vs. speculative scanner are independent — eg. Bytes pushed into the HTML parser's input byte stream must also be pushed into the speculative HTML parser's input byte stream — thereby making the separate cost incurred for speculation a possibility. I'd imagine that doing it inline with parsing might still be a trade off given that parser is specified to stop on encountering scripts. Or may be there's a way to continue tokenizing with the risk of discarding if DOM was indeed modified. Perhaps thats what Gecko does today.

I did a rough benchmark of medium to complex page sets with a fresh Chromium release build, and it seems to spend between ~70-100 ms in scanning the HTML spec, which I'd think is an extreme example given the nature of that page. The Web Bluetooth spec seems to take between 15-20 ms. For something rather simple, like CC 3.0 license, it seems to spend about 5 ms on average. These were measured on capable box, equivalent of an M1 Max Macbook, so I'd guess the gains might be much bigger for slower CPUs and hardware. My understanding today is that there's a non-trivial performance advantage to be had, depending on hardware, for pages that don't benefit from speculating resource URLs to fetch. The Origin Trial I ran in Chrome concurs with the same. The directive expect-no-linked-resources would provide a means for a web dev to assist the user agent in (avoiding) spending such resources.

So the open questions in my mind at the moment are:

Should Gecko measure the exact time spent on these pages to speculate, to confirm that there is no overhead (If it's not well established already)?
If there is trivial to no benefit to Gecko, would it still make sense to proceed to standardize for engines that implemented speculative parsing as an independent step as hinted in the HTML spec? Gecko could choose to not action the hint.

annevk · 2024-11-07T13:17:30Z

I think generally if the consensus is that engines could do more work to make it faster, we don't ask web developers to put in the work to make it faster. See priority of constituencies.

domenic · 2024-11-13T05:22:35Z

I think generally if the consensus is that engines could do more work to make it faster

It's not clear to me that this is the case. My understanding is that we have two different types of tokenizer + tree builder + parser + speculative parser architectures:

The Gecko one, which does more work as part of a single pass;
The WebKit/Blink one, in which the speculative parsing is done more separately.

The WebKit/Blink architecture benefits from the expect-no-linked-resources hint, whereas the Gecko architecture does not. But, we don't have any evidence that in general the Gecko architecture is superior to the WebKit/Blink one.

Stated another way, we have the following four scenarios:

GN: Gecko + no-hint
WN: WebKit/Blink + no-hint
GH: Gecko + hint
WH: WebKit/Blink + hint

We know that GN = GH, and WH > WN. But we don't have any information on the relationship between G and WN, or G and WH.

If G >= WN and G >= WH for all possible websites, then I agree that this feature is not very aligned with the priority of constituencies, and WebKit/Blink should move to the Gecko architecture since it is always faster.

But I suspect there are cases where WN > G, and especially that there are cases where WH > G. In that case, this feature adds value to the web, by allowing the combined forces of web developers (via the hint) and browser implementations (via the in-this-scenario-faster WebKit/Blink architecture) to speed up page loads beyond what's possible with just the Gecko architecture.

alexnj added 2 commits October 22, 2024 18:53

Add expect-no-linked-resources Document-Policy to Speculative parsing (…

e853bf8

…#1) * Add expect-no-linked-resources Document-Policy to Speculative HTML parsing

Merge branch 'whatwg:main' into main

9d8092e

domenic added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser labels Oct 23, 2024

Rewrapping and other minor tweaks

c1615b4

domenic approved these changes Oct 23, 2024

View reviewed changes

domenic added the agenda+ To be discussed at a triage meeting label Oct 23, 2024

past removed the agenda+ To be discussed at a triage meeting label Oct 24, 2024

This was referenced Oct 24, 2024

Upcoming WHATNOT meeting on 2024-10-24 #10709

Closed

Upcoming WHATNOT meeting on 2024-10-31 #10720

Closed

past added the agenda+ To be discussed at a triage meeting label Nov 6, 2024

past mentioned this pull request Nov 7, 2024

Upcoming WHATNOT meeting on 2024-11-7 #10734

Closed

alexnj mentioned this pull request Nov 14, 2024

Document-Policy: expect-no-linked-resources w3ctag/design-reviews#1014

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add expect-no-linked-resources Document-Policy to Speculative parsing #10718

Add expect-no-linked-resources Document-Policy to Speculative parsing #10718

alexnj commented Oct 23, 2024 •

edited by pr-preview bot

Loading

domenic left a comment

smaug---- commented Oct 24, 2024

alexnj commented Nov 7, 2024

annevk commented Nov 7, 2024

domenic commented Nov 13, 2024

Add expect-no-linked-resources Document-Policy to Speculative parsing #10718

Are you sure you want to change the base?

Add expect-no-linked-resources Document-Policy to Speculative parsing #10718

Conversation

alexnj commented Oct 23, 2024 • edited by pr-preview bot Loading

domenic left a comment

Choose a reason for hiding this comment

smaug---- commented Oct 24, 2024

alexnj commented Nov 7, 2024

annevk commented Nov 7, 2024

domenic commented Nov 13, 2024

alexnj commented Oct 23, 2024 •

edited by pr-preview bot

Loading