Skip to content

Word Alignment on a Diachronic Corpus (CL Final Project, Winter 20/21)

License

Notifications You must be signed in to change notification settings

siyutao/diachronic-word-alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extending IBM Word Alignment Model 1 with Edit Distance

Results on a Diachronic Parallel Corpus (CL Final Project, Winter 20/21)

Siyu Tao

To recreate the experiment results, use code in the Jupyter Notebook notebook.ipynb. Preprocessed data needed is already included in this repo/submission, so the preprocessing steps can be skipped if so desired.

All .py code are my own, score-alingment script is from jhu-mt-hw.

File Structure

Notebook:

└── notebook.ipynb
notebook.ipynb

Python scripts:

./bin/
├── ed_model.py
├── model1.py
├── preprocessing.py
└── score-alignments

Data for evaluation incl. gold sets:

./data/eval
├── de-de
│   └── de_1781_Rosalino-de_1871_Elberfelder
│       ├── text.a
│       ├── text.e
│       └── text.f
├── de-en
│   └── de_1871_Elberfelder-en_1890_Darby
│       ├── text.a
│       ├── text.e
│       └── text.f
└── en-en
    └── en_1611_KJV-en_1890_Darby
        ├── text.a
        ├── text.e
        └── text.f

Experiment output incl. summary:

./output
├── de-de_ed1.a
├── de-de_ed1.log
├── de-de_ed2.a
├── de-de_ed2.log
├── de-de_ibm1.a
├── de-de_ibm1.log
├── de-en_ed1.a
├── de-en_ed1.log
├── de-en_ed2.a
├── de-en_ed2.log
├── de-en_ibm1.a
├── de-en_ibm1.log
├── en-en_ed1.a
├── en-en_ed1.log
├── en-en_ed2.a
├── en-en_ed2.log
├── en-en_ibm1.a
├── en-en_ibm1.log
└── summary.info

About

Word Alignment on a Diachronic Corpus (CL Final Project, Winter 20/21)

Resources

License

Stars

Watchers

Forks

Packages

No packages published