- Text Exploration
- Text Cleaning
- Obtaing POS Tags, Identifying Named Entities, Lemmas, Syntactic Dependency Relations and Orthographic Features.
- Using the obtained properties as features.
- Using a Linear SVM model on the engineered features.
- Predict Categories of Unseens Data.
Tagged 3000 unseen questions from here
Variations in Features Used | Test Set Accuracy |
---|---|
Named Entities, Lemmas, POS Tags, Syntactic Dependency, Orthography | 95.96 |
Named Entities, Lemmas, POS Tags | 96.296 |
Classifying What-type Questions by Head Noun Tagging (http://www.aclweb.org/anthology/C08-1061)