-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[66_13] Reasonable herk->utf8 and utf8->herk
## Why Try to solve the Cork encoding defects by introducing the Herk encoding with minimal changes. Herk encoding is adopted in TMU serialization and deserialization. It is much better than `utf8->cork` and `cork->utf8`. Because in `utf8->cork` and `cork->utf8`, there may be two unicode maps to the same cork code. It does bring breaking changes for the TMU format, that's why we need to bump the version. But it is not a big change. ## What 1. UTF8 from 00 to 1F should be encoded as <#0> to <#1F> in Herk encoding 2. UTF8 from A0 to FF should be encoded as <#A0> to <#FF> in Herk encoding if there is not cork encoding found + Will fix copy and paste of © https://symbl.cc/en/00A9-copyright-emoji/ when we use herk encoding in copy and paste 4. Herk DF should be mapped to U+1E9E 5. Herk 17 should be mapped to U+200B 6. Herk 18 should be mapped to U+2080 7. Herk 1A should be mapped to U+0237 8. Herk 7F should be mapped to U+00AD 9. Bump to TMU 1.0.5 ## How to test ### Unit tests on branch-1.2 Before ``` (utf8->herk (string #\null)) => ; *** failed *** ; expected result: <#0> (herk->utf8 (string #\x18)) => ▒ ; *** failed *** ; expected result: ₀ (herk->utf8 (string #\x1a)) => ▒ ; *** failed *** ; expected result: ȷ (utf8->herk (string #\x10)) => ; *** failed *** ; expected result: <#10> (utf8->herk (utf8->string #u(194 160))) => ; *** failed *** ; expected result: <#A0> (herk->utf8 (string #\xdf)) => � ; *** failed *** ; expected result: ẞ (utf8->herk (string #\xff)) => � ; *** failed *** ; expected result: <#FF> ``` Now TeXmacs/tests/66_13.scm should work fine! ### Test doc Several test cases are listed in TeXmacs/tests/tmu/unicode_256.tmu The bug lies in the TMU reader.
- Loading branch information
Showing
6 changed files
with
1,853 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,266 @@ | ||
;; Two-way conversions between Cork and Unicode | ||
|
||
;; (C) 2003 Felix Breuer, David Allouche | ||
;; 2024 Darcy Shen | ||
;; | ||
;; This software falls under the GNU general public license version 3 or later. | ||
;; It comes WITHOUT ANY WARRANTY WHATSOEVER. For details, see the file LICENSE | ||
;; in the root directory or <http://www.gnu.org/licenses/gpl-3.0.html>. | ||
|
||
|
||
("#00" "#60") | ||
("#01" "#B4") | ||
("#02" "#02C6") ; modifier letter circumflex accent | ||
("#03" "#02DC") ; small tilde | ||
("#04" "#A8") | ||
("#05" "#02DD") | ||
("#06" "#02DA") | ||
("#07" "#02C7") | ||
("#08" "#02D8") | ||
("#09" "#AF") | ||
("#0A" "#02D9") | ||
("#0B" "#B8") | ||
("#0C" "#02DB") | ||
("#0D" "#201A") | ||
("#0E" "#2039") | ||
("#0F" "#203A") | ||
("#10" "#201C") | ||
("#11" "#201D") | ||
("#12" "#201E") | ||
("#13" "#AB") | ||
("#14" "#BB") | ||
("#15" "#2013") | ||
("#16" "#2014") | ||
("#17" "#200B") | ||
("#18" "#2080") | ||
("#19" "#0131") | ||
("#1A" "#0237") | ||
("#1B" "#FB00") | ||
("#1C" "#FB01") | ||
("#1D" "#FB02") | ||
("#1E" "#FB03") | ||
("#1F" "#FB04") | ||
("#20" "#20") | ||
("#21" "#21") | ||
("#22" "#22") | ||
("#23" "#23") | ||
("#24" "#24") | ||
("#25" "#25") ; percent sign | ||
("#26" "#26") | ||
("#27" "#27") | ||
("#28" "#28") | ||
("#29" "#29") | ||
("#2A" "#2A") | ||
("#2B" "#2B") | ||
("#2C" "#2C") | ||
("#2D" "#2D") | ||
("#2E" "#2E") | ||
("#2F" "#2F") | ||
("#30" "#30") | ||
("#31" "#31") | ||
("#32" "#32") | ||
("#33" "#33") | ||
("#34" "#34") | ||
("#35" "#35") | ||
("#36" "#36") | ||
("#37" "#37") | ||
("#38" "#38") | ||
("#39" "#39") | ||
("#3A" "#3A") | ||
("#3B" "#3B") | ||
("#3C" "#3C") ; less than | ||
("#3D" "#3D") | ||
("#3E" "#3E") ; greater than | ||
("#3F" "#3F") | ||
("#40" "#40") | ||
("#41" "#41") | ||
("#42" "#42") | ||
("#43" "#43") | ||
("#44" "#44") | ||
("#45" "#45") | ||
("#46" "#46") | ||
("#47" "#47") | ||
("#48" "#48") | ||
("#49" "#49") | ||
("#4A" "#4A") | ||
("#4B" "#4B") | ||
("#4C" "#4C") | ||
("#4D" "#4D") | ||
("#4E" "#4E") | ||
("#4F" "#4F") | ||
("#50" "#50") | ||
("#51" "#51") | ||
("#52" "#52") | ||
("#53" "#53") | ||
("#54" "#54") | ||
("#55" "#55") | ||
("#56" "#56") | ||
("#57" "#57") | ||
("#58" "#58") | ||
("#59" "#59") | ||
("#5A" "#5A") | ||
("#5B" "#5B") | ||
("#5C" "#5C") | ||
("#5D" "#5D") | ||
("#5E" "#5E") | ||
("#5F" "#5F") | ||
("#60" "#2018") ; typographic backquote | ||
("#61" "#61") | ||
("#62" "#62") | ||
("#63" "#63") | ||
("#64" "#64") | ||
("#65" "#65") | ||
("#66" "#66") | ||
("#67" "#67") | ||
("#68" "#68") | ||
("#69" "#69") | ||
("#6A" "#6A") | ||
("#6B" "#6B") | ||
("#6C" "#6C") | ||
("#6D" "#6D") | ||
("#6E" "#6E") | ||
("#6F" "#6F") | ||
("#70" "#70") | ||
("#71" "#71") | ||
("#72" "#72") | ||
("#73" "#73") | ||
("#74" "#74") | ||
("#75" "#75") | ||
("#76" "#76") | ||
("#77" "#77") | ||
("#78" "#78") | ||
("#79" "#79") | ||
("#7A" "#7A") | ||
("#7B" "#7B") | ||
("#7C" "#7C") | ||
("#7D" "#7D") | ||
("#7E" "#7E") | ||
("#7F" "#00AD") | ||
("#80" "#0102") | ||
("#81" "#0104") | ||
("#82" "#0106") | ||
("#83" "#010C") | ||
("#84" "#010E") | ||
("#85" "#011A") | ||
("#86" "#0118") | ||
("#87" "#011E") | ||
("#88" "#0139") | ||
("#89" "#013D") | ||
("#8A" "#0141") | ||
("#8B" "#0143") | ||
("#8C" "#0147") | ||
("#8D" "#014A") | ||
("#8E" "#0150") | ||
("#8F" "#0154") | ||
("#90" "#0158") | ||
("#91" "#015A") | ||
("#92" "#0160") | ||
("#93" "#015E") | ||
("#94" "#0164") | ||
("#95" "#0162") | ||
("#96" "#0170") | ||
("#97" "#016E") | ||
("#98" "#0178") | ||
("#99" "#0179") | ||
("#9A" "#017D") | ||
("#9B" "#017B") | ||
("#9C" "#0132") | ||
("#9D" "#0130") | ||
("#9E" "#0111") | ||
("#9F" "#A7") | ||
("#A0" "#0103") | ||
("#A1" "#0105") | ||
("#A2" "#0107") | ||
("#A3" "#010D") | ||
("#A4" "#010F") | ||
("#A5" "#011B") | ||
("#A6" "#0119") | ||
("#A7" "#011F") | ||
("#A8" "#013A") | ||
("#A9" "#013E") | ||
("#AA" "#0142") | ||
("#AB" "#0144") | ||
("#AC" "#0148") | ||
("#AD" "#014B") | ||
("#AE" "#0151") | ||
("#AF" "#0155") | ||
("#B0" "#0159") | ||
("#B1" "#015B") | ||
("#B2" "#0161") | ||
("#B3" "#015F") | ||
("#B4" "#0165") | ||
("#B5" "#0163") | ||
("#B6" "#0171") | ||
("#B7" "#016F") | ||
("#B8" "#FF") | ||
("#B9" "#017A") | ||
("#BA" "#017E") | ||
("#BB" "#017C") | ||
("#BC" "#0133") | ||
("#BD" "#A1") | ||
("#BE" "#BF") | ||
("#BF" "#A3") | ||
("#C0" "#C0") | ||
("#C1" "#C1") | ||
("#C2" "#C2") | ||
("#C3" "#C3") | ||
("#C4" "#C4") | ||
("#C5" "#C5") | ||
("#C6" "#C6") | ||
("#C7" "#C7") | ||
("#C8" "#C8") | ||
("#C9" "#C9") | ||
("#CA" "#CA") | ||
("#CB" "#CB") | ||
("#CC" "#CC") | ||
("#CD" "#CD") | ||
("#CE" "#CE") | ||
("#CF" "#CF") | ||
("#D0" "#D0") | ||
("#D1" "#D1") | ||
("#D2" "#D2") | ||
("#D3" "#D3") | ||
("#D4" "#D4") | ||
("#D5" "#D5") | ||
("#D6" "#D6") | ||
("#D7" "#0152") | ||
("#D8" "#D8") | ||
("#D9" "#D9") | ||
("#DA" "#DA") | ||
("#DB" "#DB") | ||
("#DC" "#DC") | ||
("#DD" "#DD") | ||
("#DE" "#DE") | ||
("#DF" "#1E9E") | ||
("#E0" "#E0") | ||
("#E1" "#E1") | ||
("#E2" "#E2") | ||
("#E3" "#E3") | ||
("#E4" "#E4") | ||
("#E5" "#E5") | ||
("#E6" "#E6") | ||
("#E7" "#E7") | ||
("#E8" "#E8") | ||
("#E9" "#E9") | ||
("#EA" "#EA") | ||
("#EB" "#EB") | ||
("#EC" "#EC") | ||
("#ED" "#ED") | ||
("#EE" "#EE") | ||
("#EF" "#EF") | ||
("#F0" "#F0") | ||
("#F1" "#F1") | ||
("#F2" "#F2") | ||
("#F3" "#F3") | ||
("#F4" "#F4") | ||
("#F5" "#F5") | ||
("#F6" "#F6") | ||
("#F7" "#0153") | ||
("#F8" "#F8") | ||
("#F9" "#F9") | ||
("#FA" "#FA") | ||
("#FB" "#FB") | ||
("#FC" "#FC") | ||
("#FD" "#FD") | ||
("#FE" "#FE") | ||
("#FF" "#DF") |
Oops, something went wrong.