An Inplementation of Character based Bidirectional LSTM CRF for Japanese.
This library is a library of name entity recognition (NER) for Japanese and reproduces the S Misawa et al., 2017 with pytorch.
pip install git+https://github.com/s14t284/TorchCRF#egg=TorchCRF
pip install git+https://github.com/Andolab/miNER#egg=miNER
pip install git+https://github.com/s14t284/Char-BLSTM-CRF-for-Japanese#egg=deepjapaner
I prepared sample codes. Please see train_sample.py, predict_sample.py or exec_sample.ipynb.
- Experiment(optiminzer, wordemb_path, charemb_path, train_path, test_path, dropout_rate, epoch_size, batch_size, hidden_size, learning_rate, clip_grad_num, save_path)
parameter | description |
---|---|
optimizer | setting pytorch optimizer method(torch.optim.*). For example, torch.optim.Adam, torch.optim.SGD |
wordemb_path | file path of word embedding (.txt) |
charemb_path | file path of char embedding (.txt) |
train_path | file path of train dataset |
test_path | file path of test dataset |
dev_path | file path of develop dataset |
epoch_size | epoch size using training Neural Network. |
batch_size | batch size using training Neural Network. |
hidden_size | hidden layer size of Bidirectional LSTM. |
dropout_rate | dropout rate (0 <= rate < 1). [0.0] |
learning_rate | learning rate. [1e-3] |
clip_grad_num | using gradient clipping. [5.0] |
save_path | model save path (.pth) |
method | description |
---|---|
run(label, target, measured_value, patience) | execute a Named Entity Recognition experiment. Please give name of named entity label to "label", and give value of int type to "patience". "label" and "patience" are used in early stopping. |
- ModelAPI(model_path, train_path, wordemb_path, charemb_path, hidden_size)
parameter | description |
---|---|
model_path | trained model file path (.pth) |
train_path | file path used training |
wordemb_path | path of word embedding used training |
charemb_path | path of char embedding used training |
hidden_size | size of hidden layer |
method | description |
---|---|
predict(sentence) | predict Named Entity label for sentence. Please give a Japanese sentence to sentence of argument parameter |
- S Misawa, M Taniguchi, et al. Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition (ACL2017)
MIT