PorterStemmer implements the porter stemming algorithm in version 1 and version 2. It is written in pure Ruby and does not depend on any extensions, therefore it is platform independent. However, there are much faster implementations using native extensions which are linked below.
Add this line to your application's Gemfile:
gem 'porter_stemmer'
And then execute:
$ bundle
Or install it yourself as:
$ gem install porter_stemmer
You can use PorterStemmer as follows:
PorterStemmer::Porter1.stem("generalization")
PorterStemmer::Porter2.stem("generalization")
Since the Porter 2 algorithm should be used by default, the following shortcut was added:
PorterStemmer.stem("generalization")
According to the testfile for the Poter 2 implementation from the official website the word kinkajou should be stemmed again to kinkajou, but the Poter2 implementation stems the word to kinkaj. Since this is the only known word that does not stem properly, an exceptional case was added to the code such that the tests pass. If you know why the word is stemmed like this, please let me know.
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
- lingua/stemmer (ext)