Keyvi - the short form for "Key value index" is a key value store (KVS) optimized for size and lookup speed. The usage of shared memory makes it scalable and resistant. The biggest difference to other stores is the underlying data structure based on finite state machine. Storage is very space efficient, fast and by design makes various sorts of approximate matching be it fuzzy string matching or geo highly efficient. The immutable FST data structure can be used stand-alone for static datasets. If you need online writes, you can use keyvi index, a near realtime index. The index can be used as embedded key value store, e.g. if you already have a network stack in your application. A out of the box network enabled store is available with keyvi-server.
Precompiled binary wheels are available for OS X and Linux on PyPi. To install use:
pip install keyvi
The core part is a C++ header-only library, which can be used stand-alone. For more information check the Readme file in the keyvi subfolder.
The python extension can be compiled standalone, check the Readme file in the python subfolder for more information.
- Howtos
- Compiling Dictionaries/Indexes
- Python version of keyvi
- Crashcourse
- API docs
- Using python keyvi with EMR (mrjob or pyspark)
If you like to go deep down in the basics, keyvi is inspired by the following 2 papers:
- Sparse Array (See Storing a Sparse Table, Robert E. Tarjan et al. http://infolab.stanford.edu/pub/cstr/reports/cs/tr/78/683/CS-TR-78-683.pdf)
- Incremental, which means minimization is done on the fly (See Incremental Construction of Minimal Acyclic Finite-State Automata, J. Daciuk et al.: http://www.mitpressjournals.org/doi/pdf/10.1162/089120100561601)
keyvi is licensed under Apache License 2.0("ALv2"), see license for details, all 3rdparty libraries ship with their own license. Except Boost, Snappy and zlib all 3rdparty code can be exclusively found in the 3rdparty folder. The following licenses are used for the 3rdparty code (last updated: 0.5.0
, provided without warranty).
Dependency | License |
---|---|
Boost | Boost Software License |
moodycamel::ConcurrentQueue | Simplified BSD License |
md5 | RSA MD5 License |
msgpack-c | Boost Software License |
RapidJSON | MIT License |
Snappy | BSD |
Zlib | Zlib License |
The python version ships with the same 3rdparty dependencies as the C++ code and additionaly depends on:
Dependency | License |
---|---|
msgpack (for python) | Apache License, Version 2.0 |
Thanks go to:
- Cliqz for sponsoring and opening keyvi