Skip to content

thanhlt998/mrc-ner

Repository files navigation

A Unified MRC Framework for Named Entity Recognition

The repository contains the code of the recent research advances in Shannon.AI.

A Unified MRC Framework for Named Entity Recognition
Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu and Jiwei Li
In ACL 2020. paper
If you find this repo helpful, please cite the following:

@article{li2019unified,
  title={A Unified MRC Framework for Named Entity Recognition},
  author={Li, Xiaoya and Feng, Jingrong and Meng, Yuxian and Han, Qinghong and Wu, Fei and Li, Jiwei},
  journal={arXiv preprint arXiv:1910.11476},
  year={2019}
}

For any question, please feel free to post Github issues.

Install Requirements

pip install -r requirements.txt

We build our project on pytorch-lightning. If you want to know more about the arguments used in our training scripts, please refer to pytorch-lightning documentation.

Prepare Datasets

You can download our preprocessed MRC-NER datasets or write your own preprocess scripts. We provide ner2mrc/mrsa2mrc.py for reference.

Prepare Models

For English Datasets, we use BERT-Large

For Chinese Datasets, we use RoBERTa-wwm-ext-large

Train

The main training procedure is in trainer.py

Examples to start training are in scripts/reproduce.

Note that you may need to change DATA_DIR, BERT_DIR, OUTPUT_DIR to your own dataset path, bert model path and log path, respectively.

Evaluate

trainer.py will automatically evaluate on dev set every val_check_interval epochs, and save the topk checkpoints to default_root_dir.

To evaluate them, use evaluate.py