unicode.1

.\"                                      Hey, EMACS: -*- nroff -*-
.TH UNICODE 1 "2003-01-31"
.SH NAME
unicode \- command line unicode database query tool
.SH SYNOPSIS
.B unicode
.RI [ options ] 
string
.SH DESCRIPTION
This manual page documents the
.B unicode
command.
.PP
\fBunicode\fP is a command line unicode database query tool.

.SH OPTIONS
.TP
.BI \-h 
.BI \-\-help 

Show help and exit.

.TP
.BI \-x
.BI \-\-hexadecimal

Assume 
.I string
to be a hexadecimal number 

.TP
.BI \-d
.BI \-\-decimal

Assume 
.I string
to be a decimal number 

.TP
.BI \-o
.BI \-\-octal

Assume 
.I string
to be an octal number 

.TP
.BI \-b
.BI \-\-binary

Assume 
.I string
to be a binary number 

.TP
.BI \-r
.BI \-\-regexp

Assume 
.I string
to be a regular expression

.TP
.BI \-s
.BI \-\-string

Assume 
.I string
to be a sequence of characters

.TP
.BI \-a
.BI \-\-auto

Try to guess type of
.I string
from one of the above (default)

.TP
.BI \-mMAXCOUNT
.BI \-\-max=MAXCOUNT

Maximal number of codepoints to display, default: 20; use 0 for unlimited

.TP
.BI \-iCHARSET
.BI \-\-io=IOCHARSET

I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8
capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP
tries to guess this value from your locale, so with properly set up
locale, you should not need to specify it.

.TP
.BI \-\-fcp=CHARSET
.BI \-\-fromcp=CHARSET

Convert numerical arguments from this encoding, default: no conversion.
Multibyte encodings are supported. This is ignored for non-numerical
arguments.


.TP
.BI \-cADDCHARSET
.BI \-\-charset\-add=ADDCHARSET

Show hexadecimal reprezentation of displayed characters in this additional charset.

.TP
.BI \-CUSE_COLOUR
.BI \-\-colour=USE_COLOUR

USE_COLOUR is one of
.I on
.I off
.I auto

.B \-\-colour=on
will use ANSI colour codes to colourise the output

.B \-\-colour=off
won't use colours.

.B \-\-colour=auto 
will test if standard output is a tty, and use colours only when it is.

.BI \-\-color
is a synonym of
.BI \-\-colour

.TP
.BI \-v
.BI \-\-verbose

Be more verbose about displayed characters, e.g. display Unihan information, if available.

.TP
.BI \-w
.BI \-\-wikipedia

Spawn browser pointing to English Wikipedia entry about the character.

.TP
.BI \-\-wt
.BI \-\-wiktionary

Spawn browser pointing to English Wiktionary entry about the character.

.TP
.BI \-\-brief

Display character information in brief format

.TP
.BI \-\-format=fmt

Use your own format for character information display. See the README for details.


.TP
.BI \-\-list

List (approximately) all known encodings.

.SH USAGE

\fBunicode\fP tries to guess the type of an argument. In particular, 
if the arguments looks like a valid hexadecimal representation of a
Unicode codepoint, it will be considered to be such. Using

\fBunicode\fP face

will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE,
and it will not search for 'face' in character descriptions \- for the latter,
use:

\fBunicode\fP -r face


For example, you can use any of the following to display information
about  U+00E1 LATIN SMALL LETTER A WITH ACUTE (\('a):

\fBunicode\fP 00E1

\fBunicode\fP U+00E1

\fBunicode\fP \('a

\fBunicode\fP 'latin small letter a with acute'


You can specify a range of characters as argumets, \fBunicode\fP will
show these characters in nice tabular format, aligned to 256-byte boundaries.
Use two dots ".." to indicate the range, e.g. 

\fBunicode\fP 0450..0520

will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)

\fBunicode\fP 0400.. 

will display just characters from U+0400 up to U+04FF

Use --fromcp to query codepoints from other encodings:

\fBunicode\fP --fromcp cp1250 -d 200

Multibyte encodings are supported:
\fBunicode\fP --fromcp big5 -x aff3

and multi-char strings are supported, too:

\fBunicode\fP --fromcp utf-8 -x c599c3adc5a5

.SH BUGS
Tabular format does not deal well with full-width, combining, control
and RTL characters.

.SH SEE ALSO
ascii(1)


.SH AUTHOR
Radovan Garab\('ik <garabik @ kassiopeia.juls.savba.sk>