Skip to content

Latest commit

 

History

History
77 lines (52 loc) · 3.07 KB

README.md

File metadata and controls

77 lines (52 loc) · 3.07 KB

Char-BLSTM-CRF-for-Japanese

An Inplementation of Character based Bidirectional LSTM CRF for Japanese.

This library is a library of name entity recognition (NER) for Japanese and reproduces the S Misawa et al., 2017 with pytorch.

Requirements

Installation

pip install git+https://github.com/s14t284/TorchCRF#egg=TorchCRF
pip install git+https://github.com/Andolab/miNER#egg=miNER
pip install git+https://github.com/s14t284/Char-BLSTM-CRF-for-Japanese#egg=deepjapaner

Usage

I prepared sample codes. Please see train_sample.py, predict_sample.py or exec_sample.ipynb.

Class parameter description

  • Experiment(optiminzer, wordemb_path, charemb_path, train_path, test_path, dropout_rate, epoch_size, batch_size, hidden_size, learning_rate, clip_grad_num, save_path)
parameter description
optimizer setting pytorch optimizer method(torch.optim.*). For example, torch.optim.Adam, torch.optim.SGD
wordemb_path file path of word embedding (.txt)
charemb_path file path of char embedding (.txt)
train_path file path of train dataset
test_path file path of test dataset
dev_path file path of develop dataset
epoch_size epoch size using training Neural Network.
batch_size batch size using training Neural Network.
hidden_size hidden layer size of Bidirectional LSTM.
dropout_rate dropout rate (0 <= rate < 1). [0.0]
learning_rate learning rate. [1e-3]
clip_grad_num using gradient clipping. [5.0]
save_path model save path (.pth)
method description
run(label, target, measured_value, patience) execute a Named Entity Recognition experiment. Please give name of named entity label to "label", and give value of int type to "patience". "label" and "patience" are used in early stopping.
  • ModelAPI(model_path, train_path, wordemb_path, charemb_path, hidden_size)
parameter description
model_path trained model file path (.pth)
train_path file path used training
wordemb_path path of word embedding used training
charemb_path path of char embedding used training
hidden_size size of hidden layer
method description
predict(sentence) predict Named Entity label for sentence. Please give a Japanese sentence to sentence of argument parameter

Reference

License

MIT