The Unicode Character Database (UCD) Tools is a set of Python tools and a C library with a C++ API binding. The Python tools are designed to support extracting and processing data from the text-based UCD source files, while the C library is designed to provide easy access to this information within a C or C++ program.
The project uses and supports the following sources of Unicode codepoint data:
- Unicode Character Database 11.0.0
- Unicode Emoji 11.0 (UTR #51)
- ConScript Unicode Registry
In order to build ucd-tools, you need:
- a functional autotools system (
make
,autoconf
,automake
andlibtool
); - a functional C and C++ compiler.
NOTE: The C++ compiler is used to build the test for the C++ API.
To build the documentation, you need:
- the doxygen program to build the api documentation;
- the dot program from the graphviz library to generate graphs in the api documentation.
Core Dependencies:
Dependency | Install |
---|---|
autotools | sudo apt-get install make autoconf automake libtool |
C++ compiler | sudo apt-get install gcc g++ |
Documentation Dependencies:
Dependency | Install |
---|---|
doxygen | sudo apt-get install doxygen |
graphviz | sudo apt-get install graphviz |
UCD Tools supports the standard GNU autotools build system. The source code
does not contain the generated configure
files, so to build it you need to
run:
./autogen.sh
./configure --prefix=/usr
make
The tests can be run by using:
make check
The program can be installed using:
sudo make install
The documentation can be built using:
make html
To re-generate the source files from the UCD data when a new version of unicode is released, you need to run:
./configure --prefix=/usr --with-unicode-version=VERSION
make ucd-update
where VERSION
is the Unicode version (e.g. 6.3.0
).
Additionally, you can use the UCD_FLAGS
option to control how the data is
generated. The following flags are supported:
Flag | Description |
---|---|
--with-csur | Add ConScript Unicode Registry data. |
Report bugs to the ucd-tools issues page on GitHub.
UCD Tools is released under the GPL version 3 or later license.
The UCD data files in data/ucd
are downloaded from the UCD website and are
licensed under the Unicode Terms of Use. These data files are
used in their unmodified form. They have the following Copyright notice:
Copyright © 1991-2014 Unicode, Inc. All rights reserved.
The files in data/csur
are based on the information from the ConScript
Unicode Registry maintained by John Cowan and Michael Everson.