Parts of Speech Tagging with Dynamic Algorithms

Implementing three part-of-speech tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—and comparing their accuracy across English, Korean, and Swedish.

Purpose

This project was developed as an individual assignment as part of the coursework for the “Language and Computation” at the University of St Andrews. The three algoritms were trained using data collected from Universal Dependancies Treebank. Python, along with the CoNLL-U package and NLTK, were used to process the data and train the algorithms.

Aims

Develop and implement three algorithms of varying complexity: Eager, Viterbi, and Individually Most Probable Tags.
Train the algorithms on corpora from three distinct languages: English, Swedish, and Korean.
Evaluate the part-of-speech tagging accuracy for each language using unseen test sets.

Usage

1. Install dependancies

pip install conullu
npm install nltk

2. Run script

python3 p1.py

Technologies Used

CoNLL-U: Utilized for parsing and organizing corpora from the UD Treebank into training and testing sets.
NLTK: Employed to compute emission and transition probabilities essential for training the models.

Acknowledgements

Universal Dependancies Treebank: Provided the multilingual data used for training and testing the models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Parts of Speech Tagging with Dynamic Algorithms

Purpose

Aims

Usage

1. Install dependancies

2. Run script

Technologies Used

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Parts of Speech Tagging with Dynamic Algorithms

Purpose

Aims

Usage

1. Install dependancies

2. Run script

Technologies Used

Acknowledgements