Implementation of the alignment models IBM 1 and IBM 2 for the UvA course NLP2. Parameter estimation is performed using EM for the regular formulation of IBM 1 and 2, and Variational Inference for a Bayesian Formulation of IBM1. Joint work with Fije van Overeem and Tim van Elsloo.
See the project description for more details, and the final report for our findings.
See how the alignments change over 10 epochs of training. The width of each line is proportional to its probability, and we start with uniform alignment probabilities:
These predictions are from an IBM model 2 running for 10 epochs, where each epoch is over the full 250k sentence dataset. We can see that the model predicts a perfect alignment at epoch 4. After this, the model unfortunately starts to wrongly align le to has instead of to the correct the.
The code is to produce these drawings is found in util. It was taken and adapted from a notebook in this repository.
pip install tabulate
pip install progressbar2