Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: use undici/fetch data url parser #54748

Merged
merged 8 commits into from
Sep 7, 2024

Conversation

KhafraDev
Copy link
Member

@KhafraDev KhafraDev commented Sep 3, 2024

Using the fetch parser, rather than a regex, should fix most of these edge cases.

It would probably be better to export the parser directly so we can use it for the sync parsing too. Just wanted to gauge how correct or welcome this change would be.

fetch should also be used for blob: urls, whenever support is added for them, as it handles edge cases regarding those as well. If someone ever brings back http/https imports, fetch should probably be used there as well. :)

Fixes #53775
Fixes #42890
Closes #51324

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added errors Issues and PRs related to JavaScript errors originated in Node.js core. esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels Sep 3, 2024
} catch (e) {
assert.strictEqual(e.code, 'ERR_INVALID_URL');
}
await assert.rejects(import(plainESMURL), { code: 'ERR_UNKNOWN_MODULE_FORMAT' })
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data:invalid,null is a valid data url afaik

@aduh95
Copy link
Contributor

aduh95 commented Sep 3, 2024

One of the problem of this approach is that Undici is not at all written in a way to be robust against prototype mutation, which seems to be a deal breaker when it comes to loading modules.

@KhafraDev
Copy link
Member Author

that's true, maybe I could rip the data url parser from undici and implement it here? Could be useful for the node:util or something as well.

@RedYetiDev RedYetiDev added fetch Issues and PRs related to the Fetch API and removed errors Issues and PRs related to JavaScript errors originated in Node.js core. labels Sep 3, 2024
@KhafraDev
Copy link
Member Author

KhafraDev commented Sep 4, 2024

I took undici's data url parser and added primordials, it's entirely possible I missed things or misused them. It's also likely possible to replace the mimetype parsing with the built-in one.

@mcollina
Copy link
Member

mcollina commented Sep 4, 2024

@KhafraDev can you also port the tests for the tests for the data url parser?

Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to ignore these, but they might make things simpler and faster

lib/internal/data_url.js Outdated Show resolved Hide resolved
lib/internal/data_url.js Show resolved Hide resolved
lib/internal/data_url.js Outdated Show resolved Hide resolved
lib/internal/data_url.js Outdated Show resolved Hide resolved
lib/internal/data_url.js Outdated Show resolved Hide resolved
lib/internal/data_url.js Outdated Show resolved Hide resolved
@KhafraDev KhafraDev marked this pull request as ready for review September 4, 2024 16:41
@KhafraDev
Copy link
Member Author

I took the WPTs for fetching data urls, if there's a more complete dataset, or if there's one specifically for imports, let me know.

lib/internal/data_url.js Outdated Show resolved Hide resolved
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 91.16022% with 32 lines in your changes missing coverage. Please review.

Project coverage is 87.61%. Comparing base (5949e16) to head (86d6382).
Report is 344 commits behind head on main.

Files with missing lines Patch % Lines
lib/internal/data_url.js 91.76% 26 Missing and 3 partials ⚠️
lib/internal/modules/esm/load.js 70.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main   #54748    +/-   ##
========================================
  Coverage   87.60%   87.61%            
========================================
  Files         650      651     +1     
  Lines      182829   183289   +460     
  Branches    35379    35434    +55     
========================================
+ Hits       160173   160591   +418     
- Misses      15928    15950    +22     
- Partials     6728     6748    +20     
Files with missing lines Coverage Δ
lib/internal/modules/esm/load.js 92.54% <70.00%> (-0.42%) ⬇️
lib/internal/data_url.js 91.76% <91.76%> (ø)

... and 34 files with indirect coverage changes

return encoder;
}

const ASCII_WHITESPACE_REPLACE_REGEX = /[\u0009\u000A\u000C\u000D\u0020]/g // eslint-disable-line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be in this PR, but having a benchmark for parsing these would be good. Would like to revisit to see if there's a more efficient approach than using the regex.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@KhafraDev

This comment was marked as off-topic.

@KhafraDev

This comment was marked as off-topic.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@KhafraDev

This comment was marked as off-topic.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@aduh95

This comment was marked as off-topic.

@KhafraDev

This comment was marked as off-topic.

@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented Sep 6, 2024

@aduh95 aduh95 merged commit 6c85d40 into nodejs:main Sep 7, 2024
38 of 57 checks passed
@aduh95
Copy link
Contributor

aduh95 commented Sep 7, 2024

Landed in 6c85d40

aduh95 pushed a commit that referenced this pull request Sep 12, 2024
Fixes: #53775
PR-URL: #54748
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@RafaelGSS RafaelGSS mentioned this pull request Sep 16, 2024
@KhafraDev KhafraDev deleted the data-url-fetch-parser branch September 19, 2024 03:18
targos pushed a commit that referenced this pull request Sep 30, 2024
Fixes: #53775
PR-URL: #54748
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@targos targos added the backport-requested-v20.x PRs awaiting manual backport to the v20.x-staging branch. label Sep 30, 2024
@targos
Copy link
Member

targos commented Sep 30, 2024

I tried to include this in v20.x, but it broke HTTP imports (which were removed from main/v22.x)

See https://github.com/nodejs/node/actions/runs/11102823645/job/30843465228?pr=55170

@KhafraDev
Copy link
Member Author

The test that failed is:

it('data: URL can always import other data:', async () => {
        const data = new URL('data:text/javascript,');
        data.searchParams.set('body',
                              'import \'data:text/javascript,import \'data:\''
        );
        // doesn't throw
        const empty = await import(data.href);
        assert.ok(empty);
      });

Where empty is 'data:text/javascript,?body=import+%27data%3Atext%2Fjavascript%2Cimport+%27data%3A%27'. The search gets parsed as part of the body (as data: urls cannot have search params). The test is faulty, there were some I changed in this PR that were similarly incorrect.

louwers pushed a commit to louwers/node that referenced this pull request Nov 2, 2024
Fixes: nodejs#53775
PR-URL: nodejs#54748
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. backport-requested-v20.x PRs awaiting manual backport to the v20.x-staging branch. esm Issues and PRs related to the ECMAScript Modules implementation. fetch Issues and PRs related to the Fetch API needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data URLs truncated esm module import with data scheme has bug with ternary operator in object
10 participants