This notebook creates some initial features from the dataset of a Kaggle essay scoring competition and asseses their efficacy with a random forest model using the quadratic weighted cohen kappa score. A number of useful starting features are identified, including:
- Total words
- Average word length
- Paragraph number
- Comma to fullstop ratio
- Conjunctions count
- Conjunctive adverb count
- Academic words count
- Words per sentence
- No space after comma count
The Kaggle competition score for this notebook is 0.73
The notebook can be found here.