Skip to content

akshadashelar/Marathi_POS-tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unigram tagger technique for POS tagging

POS TAGGING WITH NLTK:

The primary target of Part-of-Speech(POS) tagging is to identify the grammatical group of a given word. Whether it is a NOUN, PRONOUN, ADJECTIVE, VERB, ADVERBS, etc. based on the context. POS Tagging looks for relationships within the sentence and assigns a corresponding tag to the word.

I trained a Unigram POS Tagger provided by the NLTK library to train on the Marathi corpus.

GOALS:

You can split the Marathi Corpus dataset into train and validation sets. The goal of a POS tagger is to assign linguistic (mostly grammatical) information to sub-sentential units.

You need to accomplish the following in this assignment:

  • Write the unigram tagger algorithm for assigning POS tags.
  • Compare the tagging accuracy after making these modifications with the unigram algorithm.

USE CASES OF POS TAGS:

POS applications can be found in various tasks such as

  • Information retrieval, information extraction
  • Parsing
  • Text to Speech (TTS) applications
  • Linguistic research for corpora.

REQUIREMENTS:

Use the package manager pip to install python package manager.

pip install nltk
pip install pandas