Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity declarations with XML elements or other entities #260

Merged
merged 6 commits into from
Sep 26, 2023
Merged

Entity declarations with XML elements or other entities #260

merged 6 commits into from
Sep 26, 2023

Conversation

HeikoTheissen
Copy link
Contributor

Issue 1

Entity declarations that involve XML elements, like in

<!DOCTYPE A [
  <!ENTITY elem '<B/>'>
]>
<A>&elem;</A>

wrongly contribute a text node, which leads to

<A>&lt;B/&gt;</A>

whereas they should contribute XML elements and lead to

<A><B/></A>

Without the patch to lib/sax.js, the new test case test/entity-elem.js fails with

not ok 12 - should be equivalent
  ---
  diff: |
    --- expected
    +++ actual
    @@ -1,1 +1,1 @@
    -"2.1"
    +"2.&attr;<B ATTR=\"&attr;.3\"/>"

With the patch, it passes.

Issue 2

Entity declarations that involve other entities, like in

<!DOCTYPE A [
  <!ENTITY attr '1'>
  <!ENTITY text '2.&attr;'>
]>
<A>&text;</A>

are completely resolved, leading to

<A>2.&amp;attr;</A>

instead of

<A>2.1</A>

Without the patch to lib/sax.js, the new test case test/entity-recursive.js fails with

not ok 9 - should be equivalent
  ---
  diff: |
    --- expected
    +++ actual
    @@ -1,1 +1,1 @@
    -"2.1"
    +"2.&attr;"

With the patch, it passes.

@isaacs
Copy link
Owner

isaacs commented Sep 3, 2023

This is a great thing to support, and I'm frankly amazed at how small the code change is.

I'm wondering if it's a patch or breaking change, if anyone is depending on the incorrect behavior, since this library is so old and has been mostly untouched for so long. Also, I have a few ideas of some cases that might surface some edge cases, but I wouldn't be surprised if just works as it is. I'll poke at it tonight, thanks!

@HeikoTheissen
Copy link
Contributor Author

HeikoTheissen commented Sep 3, 2023

To avoid a breaking change, the new behavior could be made conditional on a setting during sax instantiation:

  • sax.parser({unparsedEntities: true}) leads to the new behavior
  • sax.parser() retains the current behavior

@isaacs isaacs merged commit 942ed6d into isaacs:main Sep 26, 2023
0 of 16 checks passed
@HeikoTheissen
Copy link
Contributor Author

Thanks for merging this and updating https://www.npmjs.com/package/sax !

Do you also plan to mention the new option unparsedEntities on the README.md?

@SethFalco
Copy link
Contributor

SethFalco commented Jan 22, 2024

Hey hey! Sorry to pester, just noting that this implementation actually has a bug that renders the new option unusable in many cases. In SVGO, it's preventing us from migrating from our (unmaintained) fork of sax back to upstream.

I've already opened a pull request that explains and resolves the issue, includes tests, and I also checked by parsing the conformance tests for XML to ensure no exceptions occur. Before, with the option enabled, the conformance test suite threw an exception.

Would really appreciate it if others could take a peek when they have time.

I've also fixed a few other bugs impacting us in separate pull requests btw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants