Skip to content
Jonas Gierer edited this page Jun 22, 2014 · 3 revisions

Search by website

We use a search engine called Sphinx for indexing our database content.

These character data are searchable:

  • title (both in the current language and in English)
  • section
  • decimal and hexadecimal codes
  • character sets to which the character belongs
  • other data

There are additional files that you can use to customize your search:

entities.txt and specs.txt

These two files are described in main data files. With their help, you can search by HTML-entities (e.g. β) and control characters (e.g. \t).

Synonyms of characters

In (localisation files, section "The names of characters") you can specify synonyms for each character:

00A9 : Copyright : (с)
2122 : Trade mark sign : tm

Word forms and lists of stop words

In the localisation folder there is a folder called morph with the files wordforms.txt and stopwords.txt.

The format of these files are described in the documentation of Sphinx:

For example loc/ru/morph/wordforms.txt:

phone > telephon
  • Data for Unicode-Table.com

Documentation in English

Clone this wiki locally