Skip to content

Latest commit

 

History

History
32 lines (21 loc) · 1.31 KB

README.md

File metadata and controls

32 lines (21 loc) · 1.31 KB

PTStemmer - A stemming toolkit for Portuguese in Python

Features

  • Python implementations of Orengo, Porter, and Savoy stemmers
  • Fast: can stem more than 1.5M words/second on a normal desktop
  • Least Recently Used (LRU) stem cache
  • Support for lists of words to ignore (useful for stopword and named entity removal)

About the original project

The project was originally developed by Pedro Oliveira.

This is a fork automatically exported from the Google Code original repos that lived at code.google.com/p/ptstemmer.

The original codebase also contained Java and C# implementations of the stemmers, but I removed since I had no interested in them. I have the original code tagged under original-export and can be retrieved with a simple checkout:

$ git checkout original-export

Licensing

The original work, and therefore this fork, are licensed under the GNU Lesser General Public License, version 3.0 (LGPLv3).

A verbatim copy of the license can be found in the LICENSE file.