Skip to content

A notebook showing that a number of simple ML features can be extracted from text data before using more advanced techniques.

Notifications You must be signed in to change notification settings

jcarterlab/NLP-initial-feature-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

NLP initial feature engineering

This notebook creates some initial features from the dataset of a Kaggle essay scoring competition and asseses their efficacy with a random forest model using the quadratic weighted cohen kappa score. A number of useful starting features are identified, including:

  • Total words
  • Average word length
  • Paragraph number
  • Comma to fullstop ratio
  • Conjunctions count
  • Conjunctive adverb count
  • Academic words count
  • Words per sentence
  • No space after comma count

The Kaggle competition score for this notebook is 0.73

The notebook can be found here.

About

A notebook showing that a number of simple ML features can be extracted from text data before using more advanced techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published