-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlog.txt
415 lines (325 loc) · 15.2 KB
/
log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
Unum Development Log
2018 July 3
Began development of Unum 3.0, incorporating the Unicode 11.0.0
database. The Unicode 11.0.0 announcement is:
http://blog.unicode.org/2018/06/announcing-unicode-standard-version-110.html
Downloaded the master code point database from:
https://www.unicode.org/Public/UCD/latest/ucdxml/ucd.all.flat.zip
Extracted the code point table with:
perl cpnames.pl >cpnames.txt
Copied the code point table to the parent directory.
Extracted the blocks table with:
perl blocks.pl >blocks.txt
The blocks table is included in the template program,
not a separate database. The new blocks in 11.0.0 are:
[0x1C90, 0x1CBF = 'Georgian Extended'],
[0x10D00, 0x10D3F = 'Hanifi Rohingya'],
[0x10F00, 0x10F2F = 'Old Sogdian'],
[0x10F30, 0x10F6F = 'Sogdian'],
[0x11800, 0x1184F = 'Dogra'],
[0x11D60, 0x11DAF = 'Gunjala Gondi'],
[0x11EE0, 0x11EFF = 'Makasar'],
[0x16E40, 0x16E9F = 'Medefaidrin'],
[0x1D2E0, 0x1D2FF = 'Mayan Numerals'],
[0x1EC70, 0x1ECBF = 'Indic Siyaq Numbers'],
[0x1FA00, 0x1FA6F = 'Chess Symbols'],
Changed the version number in the template program, unum_t.pl,
to "Version 3.0, July 2018".
Integrated the blocks.txt table into unum_t.pl, replacing the
previous definition of @UNICODE_BLOCKS.
Built the release versions with:
make all
Ran a:
make check
and the only differences reported were the new additions
to the block table. Updated the expected text to include
them with:
cp -p check_rcv.tmp check_exp.txt
and now make check runs with no errors.
The HTML5 named character references table:
https://www.w3.org/TR/html5/entities.json
has not changed, although the pointy-heads have shuffled
the order of some of the items in the table. If you sort
the table from last September and the one from today and
compare them, they are identical. Since we don't care
about the order of the entries, there is no reason to
remake our named character entity tables.
2020 January 4
Began development using the Unicode 12.1.0 database
downloaded from:
https://www.unicode.org/Public/12.1.0/ucdxml/ucd.all.flat.zip
Created a Makefile to rebuild from a new ucd.all.flat.xml
database.
New blocks in Unicode 12.1.0 are:
[0x10FE0, 0x10FFF = 'Elymaic'],
[0x119A0, 0x119FF = 'Nandinagari'],
[0x11FC0, 0x11FFF = 'Tamil Supplement'],
[0x13430, 0x1343F = 'Egyptian Hieroglyph Format Controls'],
[0x1B130, 0x1B16F = 'Small Kana Extension'],
[0x1E100, 0x1E14F = 'Nyiakeng Puachue Hmong'],
[0x1E2C0, 0x1E2FF = 'Wancho'],
[0x1ED00, 0x1ED4F = 'Ottoman Siyaq Numbers'],
[0x1FA70, 0x1FAFF = 'Symbols and Pictographs Extended-A'],
The HTML5 named character references table:
https://www.w3.org/TR/html5/entities.json
has not changed, although the pointy-heads have shuffled
the order of some of the items in the table. If you sort
the table from the last time and the one from today and
compare them, they are identical. Since we don't care
about the order of the entries, there is no reason to
remake our named character entity tables.
Changed the version number in the template program, unum_t.pl,
to "Version 3.1 (12.1.0), January 2020". Henceforth, the
version number will indicate the Unicode standard which the
database represents.
Integrated the 12.1.0 blocks.txt table into unum_t.pl, replacing
the previous definition of @UNICODE_BLOCKS.
2020 January 5
Added the ability to specify a character as a UTF-8 byte
sequence, coded as a number. For example:
unum utf8=0xE298A2
specifies the U+2622 RADIOACTIVE SIGN as a three-byte UTF-8
sequence. The UTF-8 parser, in function decode_UTF8(), is very
picky and diagnoses encoding errors such as extraneous bytes
after the UTF-8 sequence, continuation as first byte, malformed
continuation bytes, bytes forbidden in UTF-8, and "overlong"
encoding of values in more bytes than required.
Added an option (our first!), --utf8, which adds display of the
UTF-8 encoding of characters as a hexadecimal byte sequence.
The option only affects arguments specified after it on the
command line.
2020 January 7
Added a test to verify that the first byte of a UTF-8 code is
not a continuation which should appear only in subsequent bytes
of a multi-byte code.
2020 May 15
Began development of version 3.2, which will incorporate the
Unicode 13.0 standard:
https://www.unicode.org/Public/13.0.0/ucdxml/ucd.all.flat.zip
Downloaded into a WORK13.0 directory and rebuilt blocks.txt and
cpnames.txt with "make all". Copied cpnames.txt into the parent
build directory:
cp -p cpnames.txt ..
The HTML5 named character reference table has not changed since
the version we incorporate, although as back in January, the
order of the table posted on the W3C site is different, but
sorts into an indentical file. Since the order is irrelevant
to our processing of the table, for our purposes it is unchanged.
Changed the version number in the template program, unum_t.pl,
to "Version 3.2 (13.0.0), May 2020". The version number
indicates the Unicode standard which the database represents.
New blocks in Unicode 13.0.0 are:
[0x10E80, 0x10EBF ='Yezidi'],
[0x10FB0, 0x10FDF ='Chorasmian'],
[0x11900, 0x1195F ='Dives Akuru'],
[0x11FB0, 0x11FBF ='Lisu Supplement'],
[0x18B00, 0x18CFF ='Khitan Small Script'],
[0x18D00, 0x18D8F ='Tangut Supplement'],
[0x1FB00, 0x1FBFF ='Symbols for Legacy Computing'],
[0x30000, 0x3134F ='CJK Unified Ideographs Extension G'],
Integrated the 13.0.0 blocks.txt table into unum_t.pl, replacing
the previous definition of @UNICODE_BLOCKS.
Updated the pre-dimensioning of the @UNICODE_BLOCKS and
%UNICODE_NAMES array and hash to reflect the number of items in
13.0. This allows Perl to pre-allocate the array index and hash
buckets for these structures in advance rather than having to
repeatedly expand them as they are initialised.
Rebuilt with:
make clean
make all
Ran a "make check". The only discrepancies reported were, as
expected, the new Unicode blocks in the listing of all blocks.
After verifying that the discrepancies corresponded to the
blocks added in 13.0, I updated the check expected results:
cp -p check_rcv.tmp check_exp.txt
to include the new blocks and verified that make check now runs
with no errors.
Version 3.2.
2021 May 11
Added a --nent option which causes HTML character entities to
always be displayed as decimal numbers, for example —,
even if a named character reference exists. This makes it
easier to generate HTML compatible with older versions that
support fewer named characters.
Version 3.3.
2021 September 20
Brought source code management under Git control, importing version
3.3 as the base. Created a .gitignore file to exclude all
derivative files which are removed by a "make clean".
Changed the version number to 3.4-14.0.0. From now on, each Unum
release will have the full version number of the Unicode character
database it incorporates appended to the program version number after
a hyphen.
Updated .gitignore to exclude the Unicode Character Database XML file
downloaded from the Unicode Consortium. It it enormous (195 Mb) and
can be re-downloaded if there's a need to rebuild from it.
Began incorporation of the Unicode 14.0.0 standard:
https://www.unicode.org/versions/Unicode14.0.0/
https://www.unicode.org/Public/14.0.0/ucdxml/ucd.all.flat.zip
Downloaded into a WORK14.0 directory and rebuilt blocks.txt and
cpnames.txt with "make all".
Replaced the previous practice of copying cpnames.txt to the
parent build directory with creating a symbolic link in the
parent directory to the version to be built:
ln -s WORK14.0/cpnames.txt cpnames.txt
The HTML5 named character reference table:
https://html.spec.whatwg.org/entities.json
has not changed since the version we incorporate, although as before,
the order of the table posted on the W3C site is different, but sorts
into an indentical file. Since the order is irrelevant to our
processing of the table, for our purposes it is unchanged.
Changed the version number in the template program, unum_t.pl,
to "Version 3.4-14.0.0, September 2021". The version number
indicates the Unicode standard which the database represents.
New and modified blocks in Unicode 14.0.0 are:
[0x0870, 0x089F = 'Arabic Extended-B'],
[0x10570, 0x105BF = 'Vithkuqi'],
[0x10780, 0x107BF = 'Latin Extended-F'],
[0x10F70, 0x10FAF = 'Old Uyghur'],
[0x11700, 0x1174F = 'Ahom'],
[0x11AB0, 0x11ABF = 'Unified Canadian Aboriginal Syllabics Extended-A'],
[0x12F90, 0x12FFF = 'Cypro-Minoan'],
[0x16A70, 0x16ACF = 'Tangsa'],
[0x18D00, 0x18D7F = 'Tangut Supplement'],
[0x1AFF0, 0x1AFFF = 'Kana Extended-B'],
[0x1CF00, 0x1CFCF = 'Znamenny Musical Notation'],
[0x1DF00, 0x1DFFF = 'Latin Extended-G'],
[0x1E290, 0x1E2BF = 'Toto'],
[0x1E7E0, 0x1E7FF = 'Ethiopic Extended-B'],
Integrated the 14.0.0 blocks.txt table into unum_t.pl, replacing
the previous definition of @UNICODE_BLOCKS.
Updated the pre-dimensioning of the @UNICODE_BLOCKS and
%UNICODE_NAMES array and hash to reflect the number of items in
14.0.0. This allows Perl to pre-allocate the array index and hash
buckets for these structures in advance rather than having to
repeatedly expand them as they are initialised.
Fixed an extra "fi" in the "check" target of the Makefile which caused
the comparison of expected and received to fail.
Rebuilt with:
make clean
make all
Ran a "make check". The only discrepancies reported were, as
expected, the new Unicode blocks in the listing of all blocks.
After verifying that the discrepancies corresponded to the
blocks added in 14.0.0, I updated the check expected results:
cp -p check_rcv.tmp check_exp.txt
to include the new blocks and verified that make check now runs
with no errors.
Rebuilt distribution with:
make dist
Version 3.4-14.0.0.
Added CONTRIBUTING.md, LICENSE.md, and README.md to support GitHub.
Logged on to github.com.
Created a new repository:
unum
with access URLs:
HTTPS: https://github.com/Fourmilab/unum.git
SSH: git@github.com:Fourmilab/unum.git
Linked the local repository to the GitHub archive:
git remote add origin git@github.com:Fourmilab/unum.git
Committed the *.md files in the repository root and the
marketplace/images files to which they link.
Confirmed that my local "git sync" command works with the remote
repository.
The documents in the repository root now work properly.
2022 September 20
Changed the version number to 3.5-15.0.0.
Began incorporation of the Unicode 15.0.0 standard:
https://www.unicode.org/versions/Unicode15.0.0/
https://www.unicode.org/Public/15.0.0/ucdxml/ucd.all.flat.zip
Downloaded into a WORK15.0 subdirectory and rebuilt blocks.txt and
cpnames.txt with "make all".
Replaced the symbolic link to the previous (14.0.0) code page name
database in the parent directory with one to the new version.
rm cpnames.txt
ln -s WORK15.0/cpnames.txt cpnames.txt
The HTML5 named character reference table:
https://html.spec.whatwg.org/entities.json
has not changed since the version we incorporate, although as before,
the order of the table posted on the W3C site is different, but sorts
into an identical file. Since the order is irrelevant to our
processing of the table, for our purposes it is unchanged.
Changed the version number in the template program, unum_t.pl,
to "Version 3.5-15.0.0, September 2022". The version number
indicates the Unicode standard which the database represents.
New and modified blocks in Unicode 15.0.0 are:
[0x10EC0, 0x10EFF => 'Arabic Extended-C'],
[0x11B00, 0x11B5F => 'Devanagari Extended-A'],
[0x11F00, 0x11F5F => 'Kawi'],
[0x13430, 0x1345F => 'Egyptian Hieroglyph Format Controls'],
[0x1D2C0, 0x1D2DF => 'Kaktovik Numerals'],
[0x1E030, 0x1E08F => 'Cyrillic Extended-D'],
[0x1E4D0, 0x1E4FF => 'Nag Mundari'],
[0x31350, 0x323AF => 'CJK Unified Ideographs Extension H'],
Integrated the 15.0.0/blocks.txt table into unum_t.pl, replacing
the previous definition of @UNICODE_BLOCKS.
Updated the pre-dimensioning of the @UNICODE_BLOCKS and
%UNICODE_NAMES array and hash to reflect the number of items in
15.0.0. This allows Perl to pre-allocate the array index and hash
buckets for these structures in advance rather than having to
repeatedly expand them as they are initialised. You get the
numbers for these with:
fgrep '=>' WORK15.0/blocks.txt | wc -l # $#UNICODE_BLOCKS
wc -l WORK15.0/cpnames.txt # keys %UNICODE_NAMES
Rebuilt with:
make clean
make all
Ran a "make check". The only discrepancies reported were, as
expected, the new Unicode blocks in the listing of all blocks.
After verifying that the discrepancies corresponded to the
blocks added in 15.0.0, I updated the check expected results:
cp -p check_rcv.tmp check_exp.txt
to include the new blocks and verified that make check now runs
with no errors.
Rebuilt distribution with:
make dist
Version 3.5-15.0.0.
There was an obsolete version of the development log in
WORK14.0/log.txt. Deleted it, and kept it from creeping into WORK15.0.
2023 October 20
Changed the version number to 3.6-15.1.0.
Began work on the Unicode 15.1.0 update:
https://www.unicode.org/versions/Unicode15.1.0/
https://www.unicode.org/Public/15.1.0/ucdxml/ucd.all.flat.zip
Downloaded into a WORK15.1 subdirectory and rebuilt blocks.txt and
cpnames.txt with "make all".
Replaced the symbolic link to the previous (14.0.0) code page name
database in the parent directory with one to the new version.
rm cpnames.txt
ln -s WORK15.1/cpnames.txt cpnames.txt
The HTML5 named character reference table:
https://html.spec.whatwg.org/entities.json
has not changed since the version we incorporate, although as before,
the order of the table posted on the W3C site is different, but sorts
into an identical file. Since the order is irrelevant to our
processing of the table, for our purposes it is unchanged.
Changed the version number in the template program, unum_t.pl,
to "Version 3.6-15.1.0, October 2023". The version number
indicates the Unicode standard which the database represents.
There is only one new block in Unicode 15.1.0:
[0x2EBF0, 0x2EE5F => 'CJK Unified Ideographs Extension I']
Integrated the 15.1.0/blocks.txt table into unum_t.pl, replacing
the previous definition of @UNICODE_BLOCKS.
Updated the pre-dimensioning of the @UNICODE_BLOCKS and
%UNICODE_NAMES array and hash to reflect the number of items in
15.1.0. This allows Perl to pre-allocate the array index and hash
buckets for these structures in advance rather than having to
repeatedly expand them as they are initialised. You get the
numbers for these with:
fgrep '=>' WORK15.1/blocks.txt | wc -l # $#UNICODE_BLOCKS
wc -l WORK15.1/cpnames.txt # keys %UNICODE_NAMES
Rebuilt with:
make clean
make all
Ran a "make check". The only discrepancies reported were, as expected,
the new Unicode block in the listing of all blocks and changes in the
provisional properties used by the Unicode Han database (6 were added
and 7 removed). After verifying that the discrepancies corresponded to
the block added in 15.1.0 and documented Han changes, I updated the
check expected results:
cp -p check_rcv.tmp check_exp.txt
to include the changes and verified that make check now runs with no
errors.
Rebuilt distribution with:
make dist
Version 3.6-15.1.0.