Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(developer): escape bad markers 🙀 #10306

Merged
merged 6 commits into from
Jan 3, 2024

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Dec 29, 2023

  • escape markers in regex (\u0001 format) to prevent the n'th marker from having some other meaning!
  • with test case

#9121

@keymanapp-test-bot skip

- marker overrun! the 40th marker becomes a parenthesis, breaking regex

feat(core): ldml proper regex implementation 🙀🙀 #9121
@keymanapp-test-bot keymanapp-test-bot bot added this to the A17S28 milestone Dec 29, 2023
@srl295 srl295 changed the title feat(developer): escape bad markers feat(developer): escape bad markers 🙀 Dec 29, 2023
@srl295 srl295 force-pushed the feat/developer/9121-reject-bad-regex-epic-ldml branch from dbd06e9 to 1a845d1 Compare December 29, 2023 23:23
@srl295 srl295 requested a review from rc-swag as a code owner December 29, 2023 23:23
@github-actions github-actions bot added the core/ Keyman Core label Dec 29, 2023

<keys>
<!-- fourty markers. so that the next one would have its representation as '(' -->
<key id="1" output="\m{m01}\m{m02}\m{m03}\m{m04}\m{m05}\m{m06}\m{m07}\m{m08}\m{m09}\m{m10}\m{m11}\m{m12}\m{m13}\m{m14}\m{m15}\m{m16}\m{m17}\m{m18}\m{m19}\m{m20}\m{m21}\m{m22}\m{m23}\m{m24}\m{m25}\m{m26}\m{m27}\m{m28}\m{m29}\m{m30}\m{m31}\m{m32}\m{m33}\m{m34}\m{m35}\m{m36}\m{m37}\m{m38}\m{m39}\m{m40}"/>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙀

@srl295 srl295 marked this pull request as draft December 29, 2023 23:36
@mcdurdin mcdurdin modified the milestones: A17S28, A17S29 Dec 30, 2023
@srl295
Copy link
Member Author

srl295 commented Dec 30, 2023

interesting.

So,

  normalize_nfd_markers(patstr);

Doesn't work anymore, because it expects the sentinel pattern, not: U+FFFF U+0008 \uNNNN

probably scrambles it.

So I need a new kind of 'normalize_nfd_markers' (or a switch) that works w/ regex pattern space.

- escape markers to prevent the n'th marker from having some other meaning
- needed to remove/re-add the “≈≈[\u0001-\uD7FE]” format  (aka LDML_MARKER_ANY_INDEX)

#9121
@srl295 srl295 force-pushed the feat/developer/9121-reject-bad-regex-epic-ldml branch from 1a845d1 to 02ba64d Compare December 31, 2023 03:09
@srl295 srl295 marked this pull request as ready for review December 31, 2023 03:11
@srl295
Copy link
Member Author

srl295 commented Dec 31, 2023

the parse/format could be more optimized. or maybe use something from ICU.

common/web/types/src/ldml-keyboard/pattern-parser.ts Outdated Show resolved Hide resolved
core/src/ldml/ldml_transforms.cpp Outdated Show resolved Hide resolved
srl295 and others added 3 commits January 2, 2024 17:04
Co-authored-by: Marc Durdin <marc@durdin.net>
- move prepend_marker out of line
- replace bool for_regex with an enum
- cleanup another instance of \b -> \u0008

#9121
Base automatically changed from chore/core/ldml-test-robust-epic-ldml to master January 3, 2024 17:47
@srl295 srl295 merged commit 8b65010 into master Jan 3, 2024
25 checks passed
@srl295 srl295 deleted the feat/developer/9121-reject-bad-regex-epic-ldml branch January 3, 2024 17:47
@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 17.0.238-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants