Skip to content

Latest commit

 

History

History
21 lines (17 loc) · 1.11 KB

README.md

File metadata and controls

21 lines (17 loc) · 1.11 KB

Labser is a library to calculate and use multilingual sentence embeddings.

Dependencies

  • Python 3.6
  • PyTorch 1.0
  • NumPy, tested with 1.15.4
  • Cython, needed by Python wrapper of FastBPE, tested with 0.29.6
  • Faiss, for fast similarity search and bitext mining
  • transliterate 1.10.2, only used for Greek (pip install transliterate)
  • jieba 0.39, Chinese segmenter (pip install jieba)
  • mecab 0.996, Japanese segmenter
  • tokenization from the Moses encoder (installed automatically)
  • FastBPE, fast C++ implementation of byte-pair encoding (installed automatically)

Installation

  • set the environment variable 'LASER' to the root of the installation, e.g. export LASER="${HOME}/projects/laser"
  • download encoders from Amazon s3 by bash ./install_models.sh
  • download third party software by bash ./install_external_tools.sh