Skip to content

Latest commit

 

History

History
396 lines (301 loc) · 14.8 KB

subresource-loading.md

File metadata and controls

396 lines (301 loc) · 14.8 KB

Explainer: Subresource loading with Web Bundles

Last updated: Apr 2022

We propose a new approach to load a large number of resources efficiently using a format that allows multiple resources to be bundled, e.g. Web Bundles.

Backgrounds

  • Loading many unbundled resources is still slower in 2020. We concluded that bundling was necessary in 2018, and our latest local measurement still suggests that.

  • The output of JS bundlers (e.g. webpack) doesn't interact well with the HTTP cache. They are pretty good tools but configuring them to work in an optimal way is tough, and sometimes they'are also incompatible with new requirements like dynamic bundling (e.g. small edit with tree shaking could invalidate everything).

  • With JS bundlers, execution needs to wait for the full bytes to come. Ideally loading multiple subresources should be able to utilize full streaming and parallelization, but that's not possible if all resources are bundled as one javascript. (For JS modules execution still needs to be waited for the entire tree due to the current deterministic execution model)

  • Related issues: #411, #526

Requirements

Web pages will declare that some of their subresources are provided by the Web Bundle at a particular URL.

It's likely that the HTML parser will encounter some of the bundle's subresources before it receives the bundle's index. The declaration needs to somehow prevent the parser from double-fetching those bytes, which it can accomplish in a couple ways.

We don't see an initial need for an associated Javascript API to pull information out of the bundle.

We also don't address a way for Service Workers to use bundles to fill a Cache. Service Workers can technically unpack a bundle into cache.put() calls themselves, and, while the result may take an inefficient amount of browser-internal communication, letting some sites experiment with this will give us a better chance of designing the right API.

This feature is a powerful feature that can replace any subresources in the page. So we limit the use of this feature only in secure contexts.

This feature is NOT related to Signed Exchanges, that is a common misunderstanding. The bundle doesn't have to be signed.

<script>-based API

Developers will write

<script type="webbundle">
{
   "source": "https://example.com/dir/subresources.wbn",
   "resources": ["https://example.com/dir/a.js", "https://example.com/dir/b.js", "https://example.com/dir/c.png"]
}
</script>

to tell the browser that subresources specified in resources can be found within the https://example.com/dir/subresources.wbn bundle.

When the browser parses such a script element, it:

  1. Fetches the specified Web Bundle, https://example.com/dir/subresources.wbn.

  2. Records the resources and delays fetching a subresource specified there if a subresource's origin is the same origin as the bundle's origin and its path contains the bundle's shortened path as a prefix.

  3. As the bundle arrives, the browser fulfills those pending subresource fetches from the bundle's contents.

  4. If a fetch isn't actually contained inside the bundle, it's probably better to fail that fetch than to go to the network, since it's easier for developers to fix a deterministic network error than a performance problem.

    The primary requirement to avoid fetching the same bytes twice is that "If a specified subresource is needed later in the document, that later fetch should block until at least the index of the bundle has downloaded to see if it's there."

    It seems secondary to then say that if a specified subresource isn't in the bundle, its fetch should fail or otherwise notify the developer: that just prevents delays in starting the subresource fetch.

Example

The bundle

Suppose that the bundle, subresources.wbn, includes the following resources:

- https://example.com/dir/a.js (which depends on ./b.js)
- https://example.com/dir/b.js
- https://example.com/dir/c.png
- … (omitted)

A URL of the resource in the bundle can be a relative URL to the bundle. A browser must parse a URL using bundle's URL.

The main document

<script type="webbundle">
{
  "source": "https://example.com/dir/subresources.wbn",
  "resources": ["https://example.com/dir/a.js", "https://example.com/dir/b.js", "https://example.com/dir/c.png"]
}
</script>

<script type=”module” src=”https://example.com/dir/a.js”></script>
<img src=https://example.com/dir/c.png>

Then, a browser must fetch the bundle, subresources.wbn, and load subresources, a.js, b.js, and c.png, from the bundle.

A URL in source can be a relative URL and must be resolved on document's base URL.

A URL in resources and scopes can be a relative URL and must be resolved on the bundle's URL.

<script type="webbundle"> doesn't support src= attribute. The rule must be inline.

Request's mode and credentials mode

A request for a bundle will have its mode set to "cors" and its credentials mode set to "same-origin" unless a credentials is specified in its JSON as follows:

<script type="webbundle">
{
  "source": "https://example.com/dir/subresources.wbn",
  "credentials": "omit",
  "resources": ["https://example.com/dir/a.js", "https://example.com/dir/b.js", "https://example.com/dir/c.png"]
}
</script>

A possible value is "omit", "same-origin", or "include". See the fetch spec for details. If other values are specified, a credentials mode is set to "same-origin" .

Note: <script> element's crossorigin attribute is not used.

Request's destination

With the <script>-based API, a request for a bundle will have its destination set to "webbundle" (whatwg/fetch#1120).

CORS and CORP for subresource requests

CORS and CORP checks on subresources in bundles are based on the URL and response headers of requested subresource.

For example, if a cors request is made to a cross-origin subresource in a bundle, and the subresource does not have an Access-Control-Allow-Origin: header, the request will fail.

Similarly, if a no-cors request is made to a cross-origin subresource in a bundle, and the subresource has Cross-Origin-Resource-Policy: same-origin header, the request will fail.

Content Security Policy (CSP)

For resources loaded from bundles, URL matching of CSP is done based on the URL of the resource, not the URL of the bundle. For example, given this CSP header:

Content-Security-Policy: script-src https://example.com/script/

In the following, a.js will be loaded, but b.js will be blocked:

<script type="webbundle">
{
  "source": "https://example.com/subresources.wbn",
  "resources": ["https://example.com/script/a.js",
                "https://example.com/b.js"]
}
</script>

<script src=”https://example.com/script/a.js”></script>
<script src=”https://example.com/b.js”></script>

Defining the scopes

Instead of including a list of resources, the <script> defines a scopes.

<script type="webbundle">
{
  "source": "https://example.com/dir/subresources.wbn",
  "scopes": ["https://example.com/dir/js/",
             "https://example.com/dir/img/",
             "https://example.com/dir/css/"]
}
</script>

Any subresource under the scopes will be fetched from the bundle.

Serving constraints

See the Serving constraints for response headers which MUST be included when serving Web Bundles over HTTP.

Extensions

There are several extensions to this explainer, aiming to support various use cases which this explainer doesn't support:

See issue #641 for the motivation of splitting the explainer into the core part, this explainer, and the extension parts.

Subsequent loading and Caching

Dynamic bundle serving with WebBundles is a detailed exploration of how to efficiently retrieve only updated resources on the second load. The key property is that the client's request for a bundle embeds something like a cache digest of the resources it already has, and the server sends down the subset of the bundle that the client doesn't already have.

Compressed list of resources

As discussed in Dynamic bundle serving with WebBundles, simply including a list of resources in the HTML may cost as little as 5 bytes per URL on average after the HTML is compressed.

Alternate designs

<link>-based API

This explainer had used <link>-based API before adopting <script>-based API:

<link
  rel="webbundle"
  href="https://example.com/dir/subresources.wbn"
  resources="https://example.com/dir/a.js https://example.com/dir/b.js https://example.com/dir/c.png"
/>

However, we abandoned <link>-based API, in favor of <script>-based API. See issue #580 for the motivation. Note that some of the following alternate designs were proposed at the era of <link>-based API. This explainer doesn't rewrite them with <script>-based API yet.

Resource Bundles

A resource bundle is the same effort, with a particular scope. A resource bundle has a good FAQ which explains how this proposal and a resource bundle are related.

We have been collaborating closely to gather more feedback to draw a shared conclusion.

Summarizing the contents of the bundle

Several other mechanisms are available to give the bundler more flexibility or to compress the resource list.

Approximate Membership Query datastructure

A page still executes correctly, albeit slower than optimal, if a resource that's in a bundle is fetched an extra time, or a resource that's not in a bundle waits for the bundle to arrive before its fetch starts. That raises the possibility of putting a Bloom filter or other approximate membership query datastructure, like a cuckoo filter or quotient filter, in the scoping attribute.

In this case, it must not be an error if a resource matches the filter but turns out not to be in the bundle, since that's an expected property of this datastructure.

<link
  rel="webbundle"
  href="https://example.com/dir/subresources.wbn"
  digest="cuckoo-CwAAAAOztbwAAAM2AAAAAFeafVZwIPgAAAAA"
/>

No declarative scope

In some cases, the page might be able to control when it issues fetches for all of the resources contained in a bundle. In that case, it doesn't need to describe the bundle's scope in the <link> element but can instead listen for its load event:

<link
  rel="webbundle"
  href="https://example.com/dir/subresources.wbn"
  onload="startUsingTheSubresources()"
/>

Since the web bundles format includes an index before the content, we can optimize this by firing an event after the index is received (which expresses the bundle's exact scope) but before the content arrives:

<link
  rel="webbundle"
  href="https://example.com/dir/subresources.wbn"
  onscopereceived="startUsingTheSubresources()"
/>

Naming

We might be able to use a link type as general as "bundle", especially if it also uses the MIME type of the bundle resource to determine how to process it.

We'll need to disambiguate between a bundle meant for preloading subresources and a bundle meant as an alternative form of the current page. The second can use <link rel="alternate" type="application/web-bundle">.

Acknowledgements

Thanks to https://github.com/yoavweiss/cache-digests-cuckoo and https://github.com/google/brotli for the software used to generate sample attribute values.