Skip to content
Harri Pitkänen edited this page Jun 18, 2024 · 8 revisions

Building libvoikko and voikko-fi for use in JavaScript applications

Libvoikko can be compiled into pure JavaScript and it runs in browser and Node environments quite nicely. See this interactive demo.

Clone and configure to build the library that supports dictionary format 5 (Finnish VFST).

git clone https://github.com/voikko/corevoikko
cd corevoikko/libvoikko
source ~/emscripten/emsdk/emsdk_env.sh
./autogen.sh
js/configure.sh

If you want to embed the dictionary or use a preload file you need to fetch or build the dictionary or dictionaries that you need. Here we just download the latest snapshot of the standard Finnish dictionary.

wget https://www.puimula.org/htp/testing/voikko-snapshot-v5/dict.zip
unzip dict.zip
rm dict.zip

Select how you want to load the dictionaries

You need to select one of the three options below to build the library.

Preload mode is efficient in web environment but it cannot be used with Node:

js/build.sh preload

Embed mode works with Node but is slow to load in web environment:

js/build.sh embed

You can also choose not to load any dictionaries by default. Maybe you want to build the dictionaries separately or use the native file system directly (this is possible when you use Node). Consult Emscripten documentation on how you can load files to the virtual file system before the library is initialized:

js/build.sh plain

Testing the library and dictionary with Node

$ node
> const Libvoikko = await require('./js/libvoikko.js')()
> let v = Libvoikko.init("fi")
> v.analyze("alusta")
[ { BASEFORM: 'alustaa',
    CLASS: 'teonsana',
    FSTOUTPUT: '[Lt][Xp]alustaa[X]alusta[Tk][Ap][P2][Ny][Eb]',
    MOOD: 'imperative',
    NEGATIVE: 'both',
    NUMBER: 'singular',
    PERSON: '2',
    STRUCTURE: '=pppppp',
    TENSE: 'present_simple',
    WORDBASES: '+alustaa(alustaa)' },
  { BASEFORM: 'alku',
    CLASS: 'nimisana',
    FSTOUTPUT: '[Ln][Xp]alku[X]alu[Sela][Ny]sta',
    NUMBER: 'singular',
    SIJAMUOTO: 'sisaeronto',
    STRUCTURE: '=pppppp',
    WORDBASES: '+alku(alku)' },
  { BASEFORM: 'alus',
    CLASS: 'nimisana',
    FSTOUTPUT: '[Ln][Xp]alus[X]alu[Sp][Ny]sta',
    NUMBER: 'singular',
    SIJAMUOTO: 'osanto',
    STRUCTURE: '=pppppp',
    WORDBASES: '+alus(alus)' },
  { BASEFORM: 'alunen',
    CLASS: 'nimisana',
    FSTOUTPUT: '[Ln][Xp]alunen[X]alu[Sp][Ny]sta',
    NUMBER: 'singular',
    SIJAMUOTO: 'osanto',
    STRUCTURE: '=pppppp',
    WORDBASES: '+alunen(alunen)' },
  { BASEFORM: 'alusta',
    CLASS: 'nimisana',
    FSTOUTPUT: '[Ln][Xp]alusta[X]alust[Sn][Ny]a',
    NUMBER: 'singular',
    SIJAMUOTO: 'nimento',
    STRUCTURE: '=pppppp',
    WORDBASES: '+alusta(alusta)' } ]