Skip to content

Latest commit

 

History

History
36 lines (31 loc) · 1.44 KB

README.md

File metadata and controls

36 lines (31 loc) · 1.44 KB

Porter 2 Stemmer for PHP

A stemmer takes a given word and follows a set of rules to reduce this word to search-index-usable stem (as opposed to the actual word root). For example, aggravate, aggravated, and aggravates all reduce to "aggrav," thus creating a commonality between those words.

Martin Porter's English (Porter 2) Algorithm improves on the original Porter stemmer as described here.

Usage

After including the porter2 class in your code execution (e.g., autoloading, require_once, or a framework-specific call like Drupal's module_load_include()), stem a word (string) as follows:

$word = 'aggravated';
$porter2 = new porter2($word);
echo $porter2->stem(); // will print 'aggrav'

Custom exclusions

The default algorithm may not stem certain words to your liking. For example, texas reduces to texa, but texan does not. By passing a custom array of exclusions into the function, you can override the algorithm as needed:

$word = 'texan';
$porter2 = new porter2($word);
$stem->custom_exclusions = array('texan' => 'texa');
echo $porter2->stem(); // will print 'texa'

Stemmer Resources

Tests

A verification list of 29,000 words and their expected stems can be run at the index.php file included. For targeting individual words, use tests.php.