The code for Incorporating Chinese Characters of Words for Lexical Sememe Prediction (ACL2018) [1]
The version of python to be used for different python files has been explicitly designated in shell files.
- Python 2.7 (For running the main code)
- Python 3 (For changing the version of pickle-dumped file generated by SPWE and SPSE, only
CSP.sh
requires) - Numpy > 1.0
- In order to manage your dependency environment, we strongly encourage you to install the Anaconda.
-
Prepare a file that contains pre-trained Chinese word embeddings(of Google Word2Vec form). We recommend that the amount of words be at least 200,000 and the number of dimentions be at least 200. It will achieve much better result using a large (20GB or more is recommended) corpus to train your embeddings for running this program.
-
Rename the word embedding file as
embedding_200.txt
and put it in the repository root directory.
mv path/to/file/your_word_vec.txt ./embedding_200.txt
-
Prepare a file that contains pre-trained Chinese character embeddings(of CWE form; see paper [2] and code). We recommend that the number of dimentions be at least 200. It will achieve much better result using a large (20GB or more is recommended) corpus to train your embeddings for running this program.
-
Rename the word embedding file as
char_embedding_200.txt
and put it in the repository root directory.
mv path/to/file/your_character_embedding_file.txt ./char_embedding_200.txt
- Run
data_generator.sh
, the program will automatically generate evaluation data set and other data files required during training.
./data_generator.sh
- Run
SPWCF.sh
/SPCSE.sh
The corresponding model will be automatically learned and evaluated.
./SPWCF.sh
./SPCSE.sh
- Since we need SPWE and SPSE as a part of our model, see paper [3] and code for details. Please use SPWE and SPSE to get the model files
model_SPWE
andmodel_SPSE
and copy them to the root directory of this repository.
mv path/to/file/model_SPWE ./
mv path/to/file/model_SPSE ./
- Run
CSP.sh
The corresponding model will be automatically learned and evaluated.
./CSP.sh
[1] Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, and Leyu Lin. 2018. Incorporating Chinese Characters of Words for Lexical Sememe Prediction. In Proceedings of ACL.
[2] Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huan-Bo Luan. 2015. Joint Learning of Character and Word Embeddings. In Proceedings of IJCAI.
[3] Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, and Maosong Sun. 2017. Lexical sememe prediction via word embeddings and matrix factorization. In Proceedings of IJCAI