Skip to content

An Inplementation of Character based Bidirectional LSTM CRF for Japanese

License

Notifications You must be signed in to change notification settings

rikeda71/Char-BLSTM-CRF-for-Japanese

Repository files navigation

Char-BLSTM-CRF-for-Japanese

An Inplementation of Character based Bidirectional LSTM CRF for Japanese.

This library is a library of name entity recognition (NER) for Japanese and reproduces the S Misawa et al., 2017 with pytorch.

Requirements

Installation

pip install git+https://github.com/s14t284/TorchCRF#egg=TorchCRF
pip install git+https://github.com/Andolab/miNER#egg=miNER
pip install git+https://github.com/s14t284/Char-BLSTM-CRF-for-Japanese#egg=deepjapaner

Usage

I prepared sample codes. Please see train_sample.py, predict_sample.py or exec_sample.ipynb.

Class parameter description

  • Experiment(optiminzer, wordemb_path, charemb_path, train_path, test_path, dropout_rate, epoch_size, batch_size, hidden_size, learning_rate, clip_grad_num, save_path)
parameter description
optimizer setting pytorch optimizer method(torch.optim.*). For example, torch.optim.Adam, torch.optim.SGD
wordemb_path file path of word embedding (.txt)
charemb_path file path of char embedding (.txt)
train_path file path of train dataset
test_path file path of test dataset
dev_path file path of develop dataset
epoch_size epoch size using training Neural Network.
batch_size batch size using training Neural Network.
hidden_size hidden layer size of Bidirectional LSTM.
dropout_rate dropout rate (0 <= rate < 1). [0.0]
learning_rate learning rate. [1e-3]
clip_grad_num using gradient clipping. [5.0]
save_path model save path (.pth)
method description
run(label, target, measured_value, patience) execute a Named Entity Recognition experiment. Please give name of named entity label to "label", and give value of int type to "patience". "label" and "patience" are used in early stopping.
  • ModelAPI(model_path, train_path, wordemb_path, charemb_path, hidden_size)
parameter description
model_path trained model file path (.pth)
train_path file path used training
wordemb_path path of word embedding used training
charemb_path path of char embedding used training
hidden_size size of hidden layer
method description
predict(sentence) predict Named Entity label for sentence. Please give a Japanese sentence to sentence of argument parameter

Reference

License

MIT

About

An Inplementation of Character based Bidirectional LSTM CRF for Japanese

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published