Noun-phrase chunking does a shallow syntactic parse of a text. The text is preprocessed into suitable units, such as words or parts-of-speech. The chunker then decides whether each unit is inside or outside of a noun phrase.
This chunker is a hidden Markov model using Viterbi to find the most likely sequence of states (inside
or outside
), given observations of part-of-speech tags. I built it in Scheme, while learning Scheme, without libraries, in order to understand as thoroughly as possible how it worked.
The problem and data are from the CoNLL-2000 shared task. The model and states were given. I designed and implemented several modifications aimed at improving performance.
See results.pdf
for the full description and results.