NNIA Project

Final Project, NNIA (Wintersemester 2020/21), Saarland University

General Information

Environment Setup

Create the environment from the environment.yml file:

$ conda env create -f environment.yml

Activate the project environment:

$ conda activate meng_peilu_siyu

Data Preprocessing

For the preprocessing step, first concatenate the data into a single .conll file, if needed. For example, if there are several files all with the suffix .gold_conll and at data/ontonotes-4.0, run

$ cat data/ontonotes-4.0/*.gold_conll > data/ontonotes.conll

Then use data_prep.py to preprocess the data, which takes two arguments, a single input file and an output directory. For example, run

$ python data_prep.py -i data/ontonotes.conll -o data

The script will output two files in the output directory, namely a data.tsv containing only relevant information (word position, word, and POS tag) and a data.info containing some basic info on the data.

Train-Eval-Test Split

To use our script for training, we need to also split the data. Run

$ python data_prep.py -i data/ontonotes.conll -o data/ontonotes_splits/ --split

to split the data into train, dev, test sets. Note that the relative path to output folder should not be altered, since at the moment the relative path to train, dev and test sets are still hard-coded in our training and eval scripts.

Model Training and Evaluation

To train and evaluate a BERT + Linear model, run

$ python bert_linear_pos.py -epochs 10 --batch_size 32 --lr 5e-5 --dropout 0.25

To train a BERT + BiLSTM (single-layer) model, run

$ python bert_lstm_pos.py --epochs 10 --batch_size 32 --hdim 128 --layers 1 --bi True --dropout 0.25 --lr 5e-5

To train a BERT + BiLSTM (two-layer) model, run

$ python bert_lstm_pos.py --epochs 10 --batch_size 32 --hdim 128 --layers 2 --bi True --dropout 0.25 --lr 5e-5

Authors

In alphabetical order:

Meng Li - limengnlp
Peilu Lin - palla-lin
Siyu Tao - siyutao

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
results		results
.gitignore		.gitignore
README.md		README.md
bert_linear_pos.py		bert_linear_pos.py
bert_lstm_pos.py		bert_lstm_pos.py
data_prep.py		data_prep.py
environment.yml		environment.yml
paper.pdf		paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NNIA Project

Table of contents

General Information

Environment Setup

Data Preprocessing

Train-Eval-Test Split

Model Training and Evaluation

Authors

About

Releases

Packages

Contributors 2

Languages

siyutao/nnia-project

Folders and files

Latest commit

History

Repository files navigation

NNIA Project

Table of contents

General Information

Environment Setup

Data Preprocessing

Train-Eval-Test Split

Model Training and Evaluation

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages