forked from arp242/uni
-
Notifications
You must be signed in to change notification settings - Fork 0
/
.readme.gotxt
442 lines (282 loc) · 14.2 KB
/
.readme.gotxt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
`uni` queries the Unicode database from the commandline. It supports Unicode
14.0 (September 2021) and has good support for emojis.
There are four commands: `identify` codepoints in a string, `search` for
codepoints, `print` codepoints by class, block, or range, and `emoji` to find
emojis.
There are binaries on the [releases][release] page, and [packages][pkg] for a
number of platforms. You can also [run it in your browser][uni-wasm].
Compile from source with:
$ git clone https://github.com/arp242/uni
$ cd uni
$ go build
which will give you a `uni` binary.
README index:
- [Integrations](#integrations)
- [Usage](#usage)
- [Identify](#identify)
- [Search](#search)
- [Print](#identify)
- [Emoji](#emoji)
- [JSON](#json)
- [ChangeLog](#changelog)
- [Development](#development)
- [Alternatives](#alternatives)
[uni-wasm]: https://arp242.github.io/uni-wasm/
[release]: https://github.com/arp242/uni/releases
[pkg]: https://repology.org/project/uni/versions
Integrations
------------
- [dmenu][dmenu], [rofi][rofi], and [fzf][fzf] script at
[`dmenu-uni`](/dmenu-uni). See the top of the script for some options you may
want to frob with.
- For a Vim command see [`uni.vim`](/uni.vim); just copy/paste it in your vimrc.
[dmenu]: http://tools.suckless.org/dmenu
[rofi]: https://github.com/davatorium/rofi
[fzf]: https://github.com/junegunn/fzf
Usage
-----
*Note: the alignment is slightly off for some entries due to the way GitHub
renders wide characters; in terminals it should be aligned correctly.*
### Identify
Identify characters in a string, as a kind of a unicode-aware `hexdump`:
{{example "identify" "€"}}
`i` is a shortcut for `identify`:
{{example "i" "h€ý"}}
It reads from stdin:
$ head -c2 README.markdown | uni i
cpoint dec utf-8 html name (cat)
'[' U+005B 91 5b [ LEFT SQUARE BRACKET (Open_Punctuation)
'!' U+0021 33 21 ! EXCLAMATION MARK (Other_Punctuation)
You can use `-compact` (or `-c`) to suppress the header, and `-format` (or `-f`)
to control the output format, for example you may want to generate a codepoint
to X11 keysym mapping:
{{example "i" "-c" "-f" "0x%(hex): \"%(keysym)\", // %(name)" "h€ý"}}
See `uni help` for more details on the `-format` flag; this flag can also be
added to other commands.
### Search
Search description:
{{example "search" "euro"}}
The `s` command is a shortcut for `search`. Multiple words are matched
individually:
{{example "s" "globe" "earth"}}
Use shell quoting for more literal matches:
{{trim 3 (example "s" "rightwards" "black" "arrow")}}
{{example "s" "rightwards black arrow"}}
Add `-or` or `-o` to combine the search terms with "OR" instead of "AND":
{{example "s" "-o" "globe" "milky"}}
### Print
Print specific codepoints or groups of codepoints:
{{example "print" "U+2042"}}
Print a custom range; `U+2042`, `U2042`, and `2042` are all identical:
{{example "print" "2042..2044"}}
You can also use hex, octal, and binary numbers: `0x2024`, `0o20102`, or
`0b10000001000010`.
General category:
{{trim 3 (example "p" "Po")}}
Blocks:
{{trim 5 (example "p" "arrows" "box drawing")}}
Print as table, and with a shorter name:
{{example "p" "-as" "table" "box"}}
Or more compact table:
{{example "p" "-as" "table" "box" "-compact"}}
### Emoji
The `emoji` command (shortcut: `e`) is is the real reason I wrote this:
{{example "e" "cry"}}
By default both the name and CLDR data are searched; the CLDR data is a list of
keywords for an emoji; prefix with `name:` or `n:` to search on the name only:
{{trim 3 (example "e" "smile")}}
{{example "e" "name:smile"}}
As you can see, the CLDR is pretty useful, as "smile" only gives one result as
most emojis use "smiling".
Prefix with `group:` to search by group:
{{example "e" "group:hands"}}
Group and search can be combined, and `group:` can be abbreviated to `g:`:
{{example "e" "g:cat-face" "grin"}}
Like with `search`, use `-or` to OR the parameters together instead of AND:
{{example "e" "-or" "g:face-glasses" "g:face-hat"}}
Apply skin tone modifiers with `-tone`:
{{example "e" "-tone" "dark" "g:hands"}}
The "heart hands" may not show as it's very recent. The handshake emoji supports
setting individual skin tones per hand since Unicode 14, but this isn't
supported, mostly because I can't really really think a good CLI interface for
setting this without breaking compatibility (there are some other emojis too,
like "holding hands" and "kissing" where you can set both the gender and skin
tone of both sides individually). Maybe for uni v3 someday.
The default is to display only the gender-neutral "person", but this can be
changed with the `-gender` option:
{{example "e" "-gender" "man" "g:person-gesture"}}
Both `-tone` and `-gender` accept multiple values. `-gender women,man` will
display both the female and male variants, and `-tone light,dark` will display
both a light and dark skin tone; use `all` to display all skin tones or genders:
{{example "e" "-tone" "light,dark" "-gender" "f,m" "shrug"}}
Like `print` and `identify`, you can use `-format`:
{{example "e" "g:cat-face" "-c" "-format" "%(name): %(emoji)"}}
See `uni help` for more details on the `-format` flag.
### JSON
With `-as json` or `-as j` you can output the data as JSON:
{{example "i" "-as" "json" "h€ý"}}
All the columns listed in `-f` will be included; you can use `-f all` to include
all columns:
{{example "i" "-as" "json" "-f" "all" "h€ý"}}
This also works for the `emoji` command:
{{example "e" "-as" "json" "-f" "all" "kissing cat"}}
All values are always a string, even numerical values. This makes things a bit
easier/consistent as JSON doesn't support hex literals and such. Use `jq` or
some other tool if you want to process the data further.
ChangeLog
---------
### 2.5.1 (2022-05-09)
- Fix build on Go 1.17 and earlier.
### 2.5.0 (2022-05-03)
- Add support for properties; they can be displayed with `%(props)` in
`-format`, and selected in `print` (e.g. `uni print dash`).
- Add `uni list` command, to list categories, blocks, and properties.
- Allow explicitly selecting a block, category, or property in `print` with
`block:name` (`b:name`), `category:name` (`cat:name`, `c:name`), or
`property:name` (`prop:name`, `p:name`).
Also print an error if a string without prefix matched more than one group
(i.e. `uni p dash` matches both the property `Dash` and category
`Dash_Punctuation`).
- Add table layout with `-as table`. Also change `-json`/`-j` to `-as json` or
`-as j`. The `-json` flag is still accepted as an alias for compatibility.
- Change `-q`/`-quiet` to `-c`/`-compact`; `-as json` will print as minified if
given, and `-as table` will include less padding. `-q` is still accepted as an
alias for compatibility.
- Don't use the Go stdlib `unicode` package; since this is a Unicode 13 database
and some operations would fail on codepoints added in Unicode 14 due to the
mismatch.
### v2.4.0 (2021-12-20)
- Update import path to `zgo.at/uni/v2`.
- Add `oct` and `bin` flags for `-f` to print a codepoint as octal or binary.
- Add `f` format flag to change the fill character with alignment; e.g.
`%(bin r:auto f:0)` will print zeros on the left.
- Allow using just `o123` for an octal number (instead of `0o123`). We can't do
this for binary and decimal numbers (since `b` and `d` are valid
hexidecimals), but no reason not to do it for `o`.
### v2.3.0 (2021-10-05)
- Update to Unicode 14.0.
- UTF-16 and JSON are printed as lower case, just like UTF-8 was. Upper-case is
used only for codepoints (i.e. U+00AC).
- `uni print` can now print from UTF-8 byte sequence; for example to print the €
sign:
uni p utf8:e282ac
uni p 'utf8:e2 82 ac'
uni p 'utf8:0xe2 0x82 0xac'
Bytes can optionally be separated by any combination of `0x`, `-`, `_`, or spaces.
### v2.2.1 (2021-06-15)
- You can now use `uni p 0d40` to get U+28 by decimal.
`uni print 40` interprets the `40` as hex instead of decimal, and there was no
way to get a codepoint by decimal number. Since codepoints are much more more
common than decimals, leaving off the `U+` and `U` is a useful shortcut I'd
like to keep. AFAIK there isn't really a standard(-ish) was to explicitly
indicate a number is a decimal, so this is probably the closest.
### v2.2.0 (2021-06-05)
- Make proper use of the `/v2` import path so that `go get` and `go install`
work. (#26)
- Don't panic if `-f` doesn't contain any formatting characters.
### v2.1.0 (2021-03-30)
- Can now output as JSON with `-j` or `-json`.
- `-format all` is a special value to include all columns uni knows about. This
is useful especially in combination with `-json`.
- Add `%(block)`, `%(plane)`, `%(width)`, `%(utf16be)`, `%(utf16le)`, and
`%(json) to `-f`.
- Refactor the arp242.net/uni/unidata package to be more useful for other use
cases. This isn't really relevant for `uni` users as such, but if you want to
get information about codepoints or emojis then this package is a nice
addition to the standard library's `unicode` package.
### v2.0.0 (2021-01-03)
This changes some flags, semantics, and defaults in **incompatible** ways, hence
the bump to 2.0. If you use the `dmenu-uni` script with dmenu or fzf, then
you'll need to update that to.
- Remove the `-group` flag in favour of `group:name` syntax; this is more
flexible and will allow adding more query syntax later.
uni emoji -group groupname,othergroup Old syntax
uni emoji -group groupname,othergroup smile Old syntax
uni emoji -or group:groupname group:othergroup New syntax
uni emoji -or group:groupname group:othergroup smile New syntax
uni emoji -or g:groupname g:othergroup Can use shorter g: instead of group:
- Default for `-gender` is now `person` instead of `all`; including all genders
by default isn't all that useful, and the gender-neutral "person" should be a
fine default for most, just as the skin colour-neutral "yellow" is probably a
fine default for most.
- Add new `-or`/`-o` flag. The default for `search` and `emoji` is to show
everything where all query parameters match ("AND"); with this flag it shows
everything where at least one parameter matches ("OR").
- Add new `-format`/`-f` flag to control which columns to output and column
width. You can now also print X11 keysyms and Vim digraphs. See `uni help` for
details.
- Include CLDR data for emojis, which is searched by default if you use `uni e
<something>`. You can use `uni e name:x` to search for the name specifically.
- Show a short terse help when using just `uni`, and a more detailed help on
`uni help`. I hate it when programs print 5 pages of text to my terminal when
I didn't ask for it.
- Update Unicode data to 13.1.
- Add option to output to `$PAGER` with `-p` or `-pager`. This isn't done
automatically (I don't really like it when programs throw me in a pager), but
you can define a shell alias (`alias uni='uni -p'`) if you want it by default
since flags can be both before or after the command.
### v1.1.1 (2020-05-31)
- Fix tests of v1.1.0, requested by a packager. No changes other than this.
### v1.1.0 (2020-03-17)
- Update Unicode data from 12.1 to 13.0.
- `print` command supports codepoints as hex (`0xff`), octal (`0o42`), and
binary (`0b1001`).
- A few very small bugfixes.
### v1.0.0 (2019-12-12)
- Initial release
Development
-----------
Re-generate the Unicode data with `go generate unidata`. Files are cached in
`unidata/.cache`, so clear that if you want to update the files from remote.
This requires zsh and GNU awk (gawk).
Alternatives
------------
### CLI/TUI
- https://github.com/philpennock/character
More or less similar to uni, but very different CLI, and has some additional
features. Seems pretty good.
- https://github.com/sindresorhus/emoj
Doesn't support emojis sequences (e.g. MAN SHRUGGING is PERSON SHRUGGING +
MAN, FIREFIGHTER is PERSON + FIRE TRUCK, etc); quite slow for a CLI program
(`emoj smiling` takes 1.8s on my system, sometimes a lot longer), search
results are pretty bad (`shrug` returns unamused face, thinking face, eyes,
confused face, neutral face, tears of joy, and expressionless face ... but not
the shrugging emoji), not a fan of npm (has 1862 dependencies).
- https://github.com/Fingel/tuimoji
Grouping could be better, doesn't support emojis sequences, only interactive
TUI, feels kinda slow-ish especially when searching.
- https://github.com/pemistahl/chr
Only deals with codepoints, not emojis.
### GUI
- gnome-characters
Uses Gnome interface/window decorations and won't work well with other WMs,
doesn't deal with emoji sequences, I don't like the grouping/ordering it uses,
requires two clicks to copy a character.
- gucharmap
Doesn't display emojis, just unicode blocks.
- KCharSelect
Many KDE-specific dependencies (106M). Didn't try it.
- https://github.com/Mange/rofi-emoji and https://github.com/fdw/rofimoji
Both are pretty similar to the dmenu/rofi integration of uni with some minor
differences, and both seem to work well with no major issues.
- gtk3 emoji picker (Ctrl+; or Ctrl+. in gtk 3.93 or newer)
Only works in GTK, doesn't work with `GTK_IM_MODULE=xim` (needed for compose
key), for some reasons the emojis look ugly, doesn't display emojis sequences,
doesn't have a tooltip or other text description about what the emoji actually
is, the variation selector doesn't seem to work (never displays skin tone?),
doesn't work in Firefox.
This is so broken on my system that it seems that I'm missing something for
this to work or something?
- https://github.com/rugk/awesome-emoji-picker
Only works in Firefox; takes a tad too long to open; doesn't support skin
tones.
### Didn't investigate (yet)
Some alternatives people have suggested that I haven't looked at; make an issue
or email me if you know of any others.
- https://github.com/cassidyjames/ideogram
- https://github.com/OzymandiasTheGreat/emoji-keyboard
- https://github.com/salty-horse/ibus-uniemoji
- https://fcitx-im.org/wiki/Unicode
- http://kassiopeia.juls.savba.sk/~garabik/software/unicode/ and https://github.com/garabik/unicode (same?)
- https://billposer.org/Software/unidesc.html
- https://github.com/NoraCodes/charpicker (rofi)