The primary target of Part-of-Speech(POS) tagging is to identify the grammatical group of a given word. Whether it is a NOUN, PRONOUN, ADJECTIVE, VERB, ADVERBS, etc. based on the context. POS Tagging looks for relationships within the sentence and assigns a corresponding tag to the word.
You can split the Marathi Corpus dataset into train and validation sets. The goal of a POS tagger is to assign linguistic (mostly grammatical) information to sub-sentential units.
You need to accomplish the following in this assignment:
- Write the unigram tagger algorithm for assigning POS tags.
- Compare the tagging accuracy after making these modifications with the unigram algorithm.
POS applications can be found in various tasks such as
- Information retrieval, information extraction
- Parsing
- Text to Speech (TTS) applications
- Linguistic research for corpora.
Use the package manager pip to install python package manager.
pip install nltk
pip install pandas