Skip to content

code for creating ,mining embedding using laser and labse

License

Notifications You must be signed in to change notification settings

vigneshmj1997/Labser

Repository files navigation

Labser is a library to calculate and use multilingual sentence embeddings.

Dependencies

  • Python 3.6
  • PyTorch 1.0
  • NumPy, tested with 1.15.4
  • Cython, needed by Python wrapper of FastBPE, tested with 0.29.6
  • Faiss, for fast similarity search and bitext mining
  • transliterate 1.10.2, only used for Greek (pip install transliterate)
  • jieba 0.39, Chinese segmenter (pip install jieba)
  • mecab 0.996, Japanese segmenter
  • tokenization from the Moses encoder (installed automatically)
  • FastBPE, fast C++ implementation of byte-pair encoding (installed automatically)

Installation

  • set the environment variable 'LASER' to the root of the installation, e.g. export LASER="${HOME}/projects/laser"
  • download encoders from Amazon s3 by bash ./install_models.sh
  • download third party software by bash ./install_external_tools.sh

About

code for creating ,mining embedding using laser and labse

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published