Skip to content

Implementing three part-of-speech tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—and comparing their accuracy across English, Korean, and Swedish.

Notifications You must be signed in to change notification settings

emma-horton/PartsOfSpeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Badge NLTK Badge CoNLLU Badge Dynamic Programming Badge

Parts of Speech Tagging with Dynamic Algorithms

Implementing three part-of-speech tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—and comparing their accuracy across English, Korean, and Swedish.

Purpose

This project was developed as an individual assignment as part of the coursework for the “Language and Computation” at the University of St Andrews. The three algoritms were trained using data collected from Universal Dependancies Treebank. Python, along with the CoNLL-U package and NLTK, were used to process the data and train the algorithms.

Aims

  • Develop and implement three algorithms of varying complexity: Eager, Viterbi, and Individually Most Probable Tags.
  • Train the algorithms on corpora from three distinct languages: English, Swedish, and Korean.
  • Evaluate the part-of-speech tagging accuracy for each language using unseen test sets.

Usage

1. Install dependancies

pip install conullu
npm install nltk

2. Run script

python3 p1.py

Technologies Used

  • CoNLL-U: Utilized for parsing and organizing corpora from the UD Treebank into training and testing sets.
  • NLTK: Employed to compute emission and transition probabilities essential for training the models.

Acknowledgements

About

Implementing three part-of-speech tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—and comparing their accuracy across English, Korean, and Swedish.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages