The goal of morphemizer is to tokenize text into morphemes (the shortest
word pieces that still bear meaning). For example, “unaffable” should
tokenize to "un##", "affable"
, since “aff” does not have a meaning in
modern English, while “inescapable” should tokenize to "in##", "escape", "##able"
. "##"
characters are used to indicate breaks in
words. Prefixes are followed by “##” (as in "in##"
), suffixes are
preceded by “##” (as in "##able"
), and breaks between roots are
indicated by the special "##"
token.
You can install the released version of morphemizer from CRAN with:
# No you can't.
# install.packages("morphemizer")
And the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("jonthegeek/morphemizer")
This is not an officially supported Macmillan Learning product.
Questions or comments should be directed to Jonathan Bratt (jonathan.bratt@macmillan.com) and Jon Harmon (jonthegeek@gmail.com).