Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(common): xml2js -> fast-xml-parser #12331

Closed
wants to merge 20 commits into from

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Aug 30, 2024

replace xml2js with fast-xml-parser (much better support)

Fixes: #12208

@keymanapp-test-bot skip

- just the 'easy stuff' in this commit
- update numbers- skip all numbers
- reverted to fontsize being a string
- updated the kvks schema to match current xml
- assert.doesNotThrow just truncates the exception with no benefit
- assert early that there are no messages when checking for a load failure
@mcdurdin
Copy link
Member

@SabineSIL FYI

@darcywong00 darcywong00 modified the milestones: A18S9, A18S10 Aug 31, 2024
@srl295
Copy link
Member Author

srl295 commented Sep 9, 2024

Making some good progress, it looks like the PUA issue has to do with NCR's, either not being par or being skipped if there's an error

- difference in fast-xml-parser and xml2js handling here
- part of the move to fast-xml-parser

Fixes: #12208
mcdurdin added a commit that referenced this pull request Sep 11, 2024
The visual keyboard compiler was never finished in 17.0. This rewrites
it to:

1. Use the kmxplus data rather than reading from xml directly
2. Fill in `visualkeyboard.header.kbdname`
3. Support modifiers
4. Handle encoded characters like `\u{1234}`
5. Handle string variables like `${one}`*

Additional unit tests have been added to verify the behavior of the
visual keyboard compiler in more detail.

TODO-LDML: string variables appear to have a secondary bug -- they seem
to be returning the string 'undefined'. I have disabled the related
tests and will examine this separately, and enable those tests once
fixed.

TODO-LDML: we should probably add a compiler warning + unit test for
`<layers formId="us"><layer id="base">`, because this pattern does not
make sense: when using non-touch forms, the `<layer>` element should use
`modifiers` attribute, and correspondingly, `modifiers` attribute should
_not_ be used when `formId` is `touch`.

Other fixes:

1. The LDML XML reader was relying on its input being a Node.js `Buffer`
   even though it was declared `Uint8Array`, as it implicitly used
   `Buffer.toString()` to do text conversion. (`Buffer` subclasses from
   `Uint8Array`). This breaks when using `Uint8Array` directly and means
   we had an implicit dependency on Node.js. See also #12331.
2. XML errors were not captured in the LDML XML reader. See also #12331.
3. The unused and unfinished touch-layout-compiler.ts and
   keymanweb-compiler.ts have been removed along with corresponding unit
   tests and fixtures. These are replaced by Core implementations; see
   #12291.

Fixes: #12395
mcdurdin added a commit that referenced this pull request Sep 12, 2024
The visual keyboard compiler was never finished in 17.0. This rewrites
it to:

1. Use the kmxplus data rather than reading from xml directly
2. Fill in `visualkeyboard.header.kbdname`
3. Support modifiers
4. Handle encoded characters like `\u{1234}`
5. Handle string variables like `${one}`*

Additional unit tests have been added to verify the behavior of the
visual keyboard compiler in more detail.

* String variable tests will be enabled in next commit (which is a
cherry-pick of #12404).

Other fixes:

1. The LDML XML reader was relying on its input being a Node.js `Buffer`
   even though it was declared `Uint8Array`, as it implicitly used
   `Buffer.toString()` to do text conversion. (`Buffer` subclasses from
   `Uint8Array`). This breaks when using `Uint8Array` directly and means
   we had an implicit dependency on Node.js. See also #12331.
2. XML errors were not captured in the LDML XML reader. See also #12331.
3. The unused and unfinished touch-layout-compiler.ts and
   keymanweb-compiler.ts have been removed along with corresponding unit
   tests and fixtures. These will be replaced by Core implementations;
   see #12291.

Fixes: #12395
Cherry-pick-of: #12402
@github-actions github-actions bot added the web/ label Sep 12, 2024
turns out we didn't have a test for XML NCR in keyboards. Now we do.

Fixes: #12208
@srl295 srl295 self-assigned this Sep 12, 2024
@srl295
Copy link
Member Author

srl295 commented Sep 12, 2024

I'd like to refactor the common parsing, etc as a followon PR after we make sure this is actually working due to the dependency change.

@srl295 srl295 marked this pull request as ready for review September 12, 2024 20:30
Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started the review but have a significant change request which will, I hope, reduce the impact and scope. As it stands, I think the changes would have a number of negative cascading impacts on consumers of the data, and so I would prefer to have the data coming out of the parser match the current shape, so we don't have to change anything downstream from the parser.

@@ -2,7 +2,9 @@ import { MATCH_HEX_ESCAPE, CONTAINS_QUAD_ESCAPE, MATCH_QUAD_ESCAPE } from './con
export { MATCH_HEX_ESCAPE, CONTAINS_QUAD_ESCAPE, MATCH_QUAD_ESCAPE };

/**
* xml2js will not place single-entry objects into arrays. Easiest way to fix
* fast-xml-parser will not place single-entry objects into arrays.
* We could set a list of items expected to be arrays,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this declarative approach viable? It seems like it would be cleaner

// may or may not have extended details
if (o.code && o.line && o.msg) {
return m(this.ERROR_InvalidXML,
`Error reading XML at ${def(o.line)}:${def(o.col)}: ${def(o.code)} ${def(o.msg)}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`Error reading XML at ${def(o.line)}:${def(o.col)}: ${def(o.code)} ${def(o.msg)}`
`Error reading XML at ${def(o.line)}${o.col ? ':' + def(o.col) : ''}: ${def(o.code)} ${def(o.msg)}`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to get a little too 'formatted' for translators so we need to be careful

Comment on lines +42 to +44
"dependencies": {
"fast-xml-parser": "^4.4.1"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be in web-utils deps. web-utils is not part of Keyman Developer. It is a part of the KeymanWeb source base (see #12384 for its move to /web).

fast-xml-parser needs to be in package.json for:

  • developer/src/common/web/utils
  • developer/src/kmc-ldml
  • developer/src/kmc-package

data = r as KPJFile;
});
const parser = new XMLParser({
htmlEntities: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not expecting html entities in a .kpj file -- it is pure XML

Suggested change
htmlEntities: true,
htmlEntities: false,

const parser = new XMLParser({
htmlEntities: true,
ignoreAttributes: false, // We'd like attributes, please
attributeNamePrefix: '', // to avoid '@_' prefixes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to mix attributes and children, can we use a '$' prefix for attributes? (same goes for .kvks)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See later comment for bigger request that makes this suggestion null

ignoreAttributes: false, // We'd like attributes, please
attributeNamePrefix: '', // to avoid '@_' prefixes
numberParseOptions: {
skipLike: /(?:)/, // parse numbers as strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this somewhat awkward pattern leaves numbers as strings? Shame FXP doesn't have a numberParse: false option!

},
});
const raw = parser.parse(file.toString());
delete raw['?xml']; // XML prologue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does option ignoreDeclaration work?

Comment on lines -61 to +62
_: string;
$: { URL: string };
'#text': string;
URL: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API change which I am not excited about -- I would prefer to keep the same interfaces where possible. Yes, the original interface does inherit the shape of xml2js, but it's what we have now. We don't know if there are other consumers outside the code base. Even internally, you've missed, for example:

setupInf += `${str.$?.Name}=${str.$?.Value}\n`;

(kpj.schema.json would also need updating if we continue with this change)

Does this work to match existing shape?

   attributesGroupName: '$',
   attributesNamePrefix: '',
   ignoreAttributes: false,
   textNodeName: '_',

If we do go this way, we don't need to update any schema.json files, nor any consumers of the XML-as-js, AFAICT.

@srl295
Copy link
Member Author

srl295 commented Sep 13, 2024

I've started the review but have a significant change request which will, I hope, reduce the impact and scope. As it stands, I think the changes would have a number of negative cascading impacts on consumers of the data, and so I would prefer to have the data coming out of the parser match the current shape, so we don't have to change anything downstream from the parser.

Some seem to be positive changes, but make some sense, basically xml2js: true mode.

Yeah I think I can do it. Not this sprint or at least not this day though

@mcdurdin
Copy link
Member

Some seem to be positive changes

Oh for sure, but negative in that they require lots of changes to consumers of the API, which is costly.

@darcywong00 darcywong00 modified the milestones: A18S10, A18S11 Sep 14, 2024
/**
* relationship between this package and the related package, "related" is default if not specified
*/
Relationship?: "deprecates";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

Suggested change
Relationship?: "deprecates";
Relationship?: "deprecates"|"related";

if the default is "related"? (I didn't check if there are other possible values)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only "deprecates" is allowed:

<!-- "deprecates" is only valid value, if not present then means "related" -->
<xs:attribute name="Relationship" type="km-package-relationship" />

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. The comment in kps.xsd makes that much clearer.

Comment on lines +104 to +107
/**
* relationship between this package and the related package, "related" is default if not specified
*/
Relationship?: "deprecates";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* relationship between this package and the related package, "related" is default if not specified
*/
Relationship?: "deprecates";
/**
* relationship between this package and the related package, if not specified it means "related"
*/
Relationship?: "deprecates";

/**
* relationship between this package and the related package, "related" is default if not specified
*/
Relationship?: "deprecates";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. The comment in kps.xsd makes that much clearer.

@srl295 srl295 closed this Sep 27, 2024
@mcdurdin
Copy link
Member

did you mean to close this?

@srl295
Copy link
Member Author

srl295 commented Sep 28, 2024

did you mean to close this?

Yes, in favor of one based on #12482 (I guess I could reopen this once sorted)

@mcdurdin
Copy link
Member

did you mean to close this?

Yes, in favor of one based on #12482 (I guess I could reopen this once sorted)

OK, np. Please make sure you document when you do actions like this because I was quite confused!

Shall I'll go ahead and delete the branch?

@srl295
Copy link
Member Author

srl295 commented Sep 28, 2024

Shall I'll go ahead and delete the branch?

Not yet because I am using it to work from!

I will either reopen it or open a new one

@srl295 srl295 deleted the chore/common/12208/fastxmlparser branch October 25, 2024 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

refactor(common): Consider moving from xml2js to fast-xml-parser
4 participants