Skip to content

Latest commit

 

History

History
31 lines (27 loc) · 1.86 KB

README.md

File metadata and controls

31 lines (27 loc) · 1.86 KB

Python Badge NLTK Badge CoNLLU Badge Dynamic Programming Badge

Parts of Speech Tagging with Dynamic Algorithms

Implementing three part-of-speech tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—and comparing their accuracy across English, Korean, and Swedish.

Purpose

This project was developed as an individual assignment as part of the coursework for the “Language and Computation” at the University of St Andrews. The three algoritms were trained using data collected from Universal Dependancies Treebank. Python, along with the CoNLL-U package and NLTK, were used to process the data and train the algorithms.

Aims

  • Develop and implement three algorithms of varying complexity: Eager, Viterbi, and Individually Most Probable Tags.
  • Train the algorithms on corpora from three distinct languages: English, Swedish, and Korean.
  • Evaluate the part-of-speech tagging accuracy for each language using unseen test sets.

Usage

1. Install dependancies

pip install conullu
npm install nltk

2. Run script

python3 p1.py

Technologies Used

  • CoNLL-U: Utilized for parsing and organizing corpora from the UD Treebank into training and testing sets.
  • NLTK: Employed to compute emission and transition probabilities essential for training the models.

Acknowledgements