Parts of Speech Tagging with Dynamic Algorithms

Implementing three part-of-speech tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—and comparing their accuracy across English, Korean, and Swedish.

Purpose

This project was developed as an individual assignment as part of the coursework for the “Language and Computation” at the University of St Andrews. The three algoritms were trained using data collected from Universal Dependancies Treebank. Python, along with the CoNLL-U package and NLTK, were used to process the data and train the algorithms.

Aims

Develop and implement three algorithms of varying complexity: Eager, Viterbi, and Individually Most Probable Tags.
Train the algorithms on corpora from three distinct languages: English, Swedish, and Korean.
Evaluate the part-of-speech tagging accuracy for each language using unseen test sets.

Usage

1. Install dependancies

pip install conullu
npm install nltk

2. Run script

python3 p1.py

Technologies Used

CoNLL-U: Utilized for parsing and organizing corpora from the UD Treebank into training and testing sets.
NLTK: Employed to compute emission and transition probabilities essential for training the models.

Acknowledgements

Universal Dependancies Treebank: Provided the multilingual data used for training and testing the models.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
treebanks		treebanks
README.md		README.md
logsumexptrick.py		logsumexptrick.py
p1.py		p1.py
smoothing.py		smoothing.py
treebanks.py		treebanks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parts of Speech Tagging with Dynamic Algorithms

Purpose

Aims

Usage

1. Install dependancies

2. Run script

Technologies Used

Acknowledgements

About

Releases

Packages

Languages

emma-horton/PartsOfSpeech

Folders and files

Latest commit

History

Repository files navigation

Parts of Speech Tagging with Dynamic Algorithms

Purpose

Aims

Usage

1. Install dependancies

2. Run script

Technologies Used

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages