Skip to content

Latest commit

 

History

History

mts

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MTS Home Assignment (2020)

Position: Middle NLP Engineer.

Word Sense Induction assignment
The task is based on: link

Data

Data is stored in a folder data
Since data is stored as submodule (URL), to download it execute:

git submodule init
git submodule update

Train data: main/active-dict
Test data: additional/active-rutenten
Baseline: script, main/active-dict, additional/active-rutenten

Modules

This folder contains third-party libraries/repositories/weights that are used in experiments, namely:

  • bertwsi - Word Sense Induction with BERT
  • spacy-ru - russian language models for spaCy (used in bertwsi)
  • simple_elmo - simple library to work with pre-trained ELMo models in TensorFlow

Since modules are git repositories, they are stored as submodules.
To download it execute:

git submodule init
git submodule update

To install adagram model execute:

pip install git+https://github.com/lopuhin/python-adagram.git

ruscorpora_mean_hs.model.bin.gz word2vec weights (from RusVectores) are stored using git lfs
To download it install git lfs and execute:

git lfs fetch
git lfs checkout

To download ruwikiruscorpora_lemmas_elmo_1024_2019 ELMo weights (from RusVectores) execute:

./download_elmo_weights.sh

Note: should be executed from ml_interviews/mts/modules

Solutions

All solutions are stored in a folder solutions:

All predictions are stored in a folder predictions: