High-Performance Stemmer, Tokenizer, and Spell Checker for R
Low level spell checker and morphological analyzer based on the famous hunspell library https://hunspell.github.io. The package can analyze or check individual words as well as tokenize text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' package which builds on this package with utilities to automate checking of files, documentation and vignettes in all common formats.
This package includes a bundled version of libhunspell and no longer depends on external system libraries:
install.packages("hunspell")
About the R package:
- Blog post: Hunspell: Spell Checker and Text Parser for R
- Blog post: Stemming and Spell Checking in R
# Check individual words
words <- c("beer", "wiskey", "wine")
correct <- hunspell_check(words)
print(correct)
# Find suggestions for incorrect words
hunspell_suggest(words[!correct])
# Extract incorrect from a piece of text
bad <- hunspell("spell checkers are not neccessairy for langauge ninja's")
print(bad[[1]])
hunspell_suggest(bad[[1]])
# Stemming
words <- c("love", "loving", "lovingly", "loved", "lover", "lovely", "love")
hunspell_stem(words)
hunspell_analyze(words)
The spelling package uses this package to spell R package documentation:
# Spell check a package
library(spelling)
spell_check_package("~/mypackage")