Labser is a library to calculate and use multilingual sentence embeddings.
- Python 3.6
- PyTorch 1.0
- NumPy, tested with 1.15.4
- Cython, needed by Python wrapper of FastBPE, tested with 0.29.6
- Faiss, for fast similarity search and bitext mining
- transliterate 1.10.2, only used for Greek (
pip install transliterate
) - jieba 0.39, Chinese segmenter (
pip install jieba
) - mecab 0.996, Japanese segmenter
- tokenization from the Moses encoder (installed automatically)
- FastBPE, fast C++ implementation of byte-pair encoding (installed automatically)
- set the environment variable 'LASER' to the root of the installation, e.g.
export LASER="${HOME}/projects/laser"
- download encoders from Amazon s3 by
bash ./install_models.sh
- download third party software by
bash ./install_external_tools.sh