Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expect-no-linked-resources Document-Policy to Speculative parsing #10718

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alexnj
Copy link

@alexnj alexnj commented Oct 23, 2024

User Agents have implemented speculative parsing of HTML to speculatively fetch resources that are present in the HTML markup, to speed up page loading. For the vast majority of pages on the Web that have resources declared in the HTML markup, the optimization is beneficial and the cost paid in determining such resources is a sound tradeoff. However, the following scenarios might result in a sub-optimal performance tradeoff vs. the explicit time spent parsing HTML for determining sub resources to fetch:

  • Pages that do not have any resources declared in the HTML.
  • Large HTML pages with minimal or no resource loads that could explicitly control preloading resources via other preload mechanisms available.

This proposal introduces a configuration point in Document Policy by the name expect-no-linked-resources to hint to a User Agent that it may choose to optimize out the time spent in such sub resource determination.

Read the complete Explainer and spec changes proposed that covers the changes in this PR.


/common-dom-interfaces.html ( diff )
/common-microsyntaxes.html ( diff )
/dom.html ( diff )
/index.html ( diff )
/infrastructure.html ( diff )
/parsing.html ( diff )
/references.html ( diff )
/structured-data.html ( diff )
/urls-and-fetching.html ( diff )

@domenic domenic added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser labels Oct 23, 2024
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an extra commit with small fixes I noticed during a final review.

Otherwise, this editorially LGTM.

I know this was discussed at the WebPerf WG and there was some general support from multiple implementers. And you're working on standards positions now. But if any implementers want to comment here, that'd be very welcome! I'll tag this as agenda+ to get some attention from the WHATNOT meeting crowd.

It's also noteworthy that this is the first document policy feature in HTML, and Document Policy itself is not yet integrated into HTML: https://wicg.github.io/document-policy/#integration-with-html . If this feature gets multi-implementer interest, than we should work on doing that integration sooner rather than later. /cc @clelland

@domenic domenic added the agenda+ To be discussed at a triage meeting label Oct 23, 2024
@smaug----
Copy link

Isn't this basically hacking around some implementation limitations in blink (and maybe in webkit)? Gecko doesn't in general need to do separate speculation passes.

@past past removed the agenda+ To be discussed at a triage meeting label Oct 24, 2024
@past past added the agenda+ To be discussed at a triage meeting label Nov 6, 2024
@alexnj
Copy link
Author

alexnj commented Nov 7, 2024

If the speculative parsing step isn't conducted as a separate step, there might trivial to no benefit for Gecko I'd imagine. If the documented implementation behavior still holds true, there might be a non-trivial benefit to Gecko for the speculation failure scenario (but only in cases where the web dev is able to hint to Gecko).

The HTML spec doesn't demand whether speculation should be conducted as a separate step, or inline with document parsing. The language of the spec seems written in a way that the implementation of parser vs. speculative scanner are independent — eg. Bytes pushed into the HTML parser's input byte stream must also be pushed into the speculative HTML parser's input byte stream — thereby making the separate cost incurred for speculation a possibility. I'd imagine that doing it inline with parsing might still be a trade off given that parser is specified to stop on encountering scripts. Or may be there's a way to continue tokenizing with the risk of discarding if DOM was indeed modified. Perhaps thats what Gecko does today.

I did a rough benchmark of medium to complex page sets with a fresh Chromium release build, and it seems to spend between ~70-100 ms in scanning the HTML spec, which I'd think is an extreme example given the nature of that page. The Web Bluetooth spec seems to take between 15-20 ms. For something rather simple, like CC 3.0 license, it seems to spend about 5 ms on average. These were measured on capable box, equivalent of an M1 Max Macbook, so I'd guess the gains might be much bigger for slower CPUs and hardware. My understanding today is that there's a non-trivial performance advantage to be had, depending on hardware, for pages that don't benefit from speculating resource URLs to fetch. The Origin Trial I ran in Chrome concurs with the same. The directive expect-no-linked-resources would provide a means for a web dev to assist the user agent in (avoiding) spending such resources.

So the open questions in my mind at the moment are:

  • Should Gecko measure the exact time spent on these pages to speculate, to confirm that there is no overhead (If it's not well established already)?
  • If there is trivial to no benefit to Gecko, would it still make sense to proceed to standardize for engines that implemented speculative parsing as an independent step as hinted in the HTML spec? Gecko could choose to not action the hint.

@annevk
Copy link
Member

annevk commented Nov 7, 2024

I think generally if the consensus is that engines could do more work to make it faster, we don't ask web developers to put in the work to make it faster. See priority of constituencies.

@domenic
Copy link
Member

domenic commented Nov 13, 2024

I think generally if the consensus is that engines could do more work to make it faster

It's not clear to me that this is the case. My understanding is that we have two different types of tokenizer + tree builder + parser + speculative parser architectures:

  • The Gecko one, which does more work as part of a single pass;
  • The WebKit/Blink one, in which the speculative parsing is done more separately.

The WebKit/Blink architecture benefits from the expect-no-linked-resources hint, whereas the Gecko architecture does not. But, we don't have any evidence that in general the Gecko architecture is superior to the WebKit/Blink one.

Stated another way, we have the following four scenarios:

  • GN: Gecko + no-hint
  • WN: WebKit/Blink + no-hint
  • GH: Gecko + hint
  • WH: WebKit/Blink + hint

We know that GN = GH, and WH > WN. But we don't have any information on the relationship between G and WN, or G and WH.

If G >= WN and G >= WH for all possible websites, then I agree that this feature is not very aligned with the priority of constituencies, and WebKit/Blink should move to the Gecko architecture since it is always faster.

But I suspect there are cases where WN > G, and especially that there are cases where WH > G. In that case, this feature adds value to the web, by allowing the combined forces of web developers (via the hint) and browser implementations (via the in-this-scenario-faster WebKit/Blink architecture) to speed up page loads beyond what's possible with just the Gecko architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements agenda+ To be discussed at a triage meeting needs implementer interest Moving the issue forward requires implementers to express interest topic: parser
Development

Successfully merging this pull request may close these issues.

5 participants