-
Notifications
You must be signed in to change notification settings - Fork 117
Data (English)
Files located in the folder data
:
-
sections.txt
— Unicode table sections -
sets.txt
— symbols sets -
entities.txt
— mnemonics (e.g.©
) -
types.txt
— sections types (alphabet, abugida) -
languages.txt
— section languages -
countries.txt
— section countries -
specs.txt
— control characters (e.g.\n
)
These files are only for common data (language independent). All names and descriptions are located in localisation files.
For example file sections.txt
:
# Sections params
[greek-coptic]
diap : 0370:03FF
type : alphabet
languages : greek, coptic
countries : greece
[cyrillic]
diap : 0400:04FF
type : alphabet
languages : russian, ukrainian, bulgarian
countries : russia, ukraine, bulgaria, serbia, macedonia, moldova
Lines begining with a #
are comments and are ingnored. Empty lines are ignored as well.
For example two objects: greek alphabet (greek-coptic) and cyrillic.
Section descriptions begin with the section key (cyrillic
) wich is in square brackets.
Then follows a list of characteristics in the form of characteristic : value
.
The key of the object has several purposes:
- To link to this object from other files (for example, localisation files).
- To use it as an URL. For example: http://unicode-table.com/en/sections/cyrillic/
The key should be unique and consist of lowercase latin characters, numerals or hyphens.
The list of arguments depends on the content. Arguments can be mandatory or optional. The value can be a string or a list of comma-separated values (e.g. russian, ukrainian, bulgarian
).
Please note that we use keys instead of names of counties that can be different in various languages. The keys are defined in the files languages.txt
and countries.txt
.
Arguments:
-
diap
— the diapason (range) of the values (e.g.0370:03FF
). The diapasons of different sections should not intersect. -
type
— type (e.g.alphabet
orabugida
). Corresponds to the types oftypes.txt
. Not Required. -
languages
— a list of languages that use the symbols in this section. Corresponds to the languages oflanguages.txt
. Not Required. -
countries
— a list of countries that use the symbols in this section. Corresponds to the countries ofcountries.txt
. Not Required.
Used for pages (http://unicode-table.com/sets/)
Arguments:
-
set
— a list of characters in this set
Example:
[set-abcdef]
set : a, b, c, d, e, f
At the moment there are no arguments defined, so just specify the list of keys.
[abjad]
[abugida]
[alphabet]
Similarly to type
, these have no arguments.
Arguments:
-
map
— the coordinates of this country. Format:x:y
(e.g.110:75
)
For Example: ©
— copyright sign.
The file has a simple format:
copy : 169
ordf : 170
laquo : 171
not : 172
First the sequence name (without &
and ;
), then the decimal code of the character.
At the moment used in searches: http://unicode-table.com/en/search/?q=%26copy%3B
These are characters like \n
, \t
etc.
The file format is similar to entities.txt
:
0: 0
a: 7
b: 8
t: 9
n: 10
v: 11
f: 12
r: 13
First the sequence of characters without the slash, then the decimal code of character. This is also used for searching.
Please note that you can only refer to existing objects.
For example, if you want cyrillic
to refer to lang-unknown
:
[cyrillic]
diap : 0400:04FF
type : alphabet
languages : russian, ukrainian, bulgarian, lang-unknown
You have to create lang-unknown
in languages.txt
and translate it to as many languages in the localisation files as possible (at least to English).