Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost Performance and Support Tokens #18

Open
wants to merge 52 commits into
base: development
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
ee5dcde
Update .travis.yml
har07 Jan 16, 2016
9206852
add files for package distribution, according to : http://python-pack…
har07 Jan 16, 2016
42b5c4d
config files for travis-ci, coveralls, pypi/setup
har07 Jan 16, 2016
11a4d88
update readme
har07 Jan 16, 2016
64bb82f
merge with github repo
har07 Jan 16, 2016
4d35327
remove support for Python 3.2 due to build error
har07 Jan 16, 2016
99eba63
fix typo
har07 Jan 16, 2016
ecf2c11
PyPI/pip badge
har07 Jan 17, 2016
d512c1a
Merge remote-tracking branch 'origin/master'
har07 Jan 17, 2016
274223a
Update README.md
har07 Jan 18, 2016
7e21247
commit successful pip setup configurations
har07 Jan 18, 2016
49805da
Merge branch 'master' of https://github.com/har07/sastrawi
har07 Jan 18, 2016
d358279
Update README.md
har07 Jan 19, 2016
df24307
Update travis-ci link.Add demo URL
har07 Mar 25, 2016
1868b94
Update README.rst
har07 Mar 25, 2016
0d7fcdb
Memperbaiki StopWordRemover saat menghapus list words (skipped cursor…
gsarwohadi Jun 21, 2016
a8c4874
Merge pull request #1 from gsarwohadi/master
har07 Jul 23, 2016
010f228
Update to Sastrawi v1.2.0
prasastoadi Oct 23, 2016
fe2b42a
update coverage badge link
har07 Oct 26, 2016
2138a24
Merge pull request #2 from prasastoadi/master
har07 Oct 26, 2016
af480e5
change lower sequence (#5)
khrlimam May 5, 2017
5f30ede
fix NameError when installing (#6)
widnyana Oct 28, 2017
32125a5
release v1.1.0
har07 Apr 24, 2018
49372b3
Add LICENSE (#8)
prasastoadi Apr 24, 2018
0ab8ce2
update new pypi badge
har07 Apr 24, 2018
c784cd7
remove empty line from list kata-dasar
har07 Sep 23, 2018
b31d6f6
Mengubah dictionary dari list ke dictionary
sanspa Sep 23, 2018
65cd03a
release 1.2.0
har07 Sep 23, 2018
01afc81
update pip install instruction
har07 Sep 24, 2018
3625027
Add Stopwords Tala 2003, Add lru_cache
MufidJamaluddin Mar 14, 2019
9890fcf
Test Stopword Tala
MufidJamaluddin Mar 14, 2019
7a55cbf
Boost Performance
MufidJamaluddin Mar 15, 2019
5630ad6
add stem word
MufidJamaluddin Mar 15, 2019
1d9554f
add stem & stopword removal from tokens/word list
MufidJamaluddin Mar 15, 2019
81b06a4
add python 3.7
MufidJamaluddin Mar 15, 2019
748e608
Merge branch 'development' into master
Mar 15, 2019
150a839
Minor
MufidJamaluddin Mar 15, 2019
99bfac5
Fix Error
MufidJamaluddin Mar 15, 2019
abccaca
Merge branch 'master' of https://github.com/MufidJamaluddin/PySastrawi
MufidJamaluddin Mar 15, 2019
edf2c81
fix error python 2.7
MufidJamaluddin Mar 15, 2019
a47d9b2
LruCache python 2.7
MufidJamaluddin Mar 15, 2019
58d35a7
minor
MufidJamaluddin Mar 15, 2019
345edd1
Fix critical bugs
MufidJamaluddin Mar 15, 2019
1a5f7d6
Travis for Python 3.7
MufidJamaluddin Mar 15, 2019
ae3bc91
add test case
MufidJamaluddin Mar 15, 2019
9fc1b3e
Add Test Case
MufidJamaluddin Mar 15, 2019
15fe5d6
Test Case
MufidJamaluddin Mar 15, 2019
3e4151a
Define Abstract Method & Update Test Case
MufidJamaluddin Mar 15, 2019
6d9fd87
Minor
MufidJamaluddin Mar 15, 2019
8bfc448
LruCache
MufidJamaluddin Apr 19, 2019
3470898
minor
MufidJamaluddin Apr 19, 2019
169edcf
remove lrucache stemword
MufidJamaluddin Apr 19, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
omit =
*/python?.?/*
*/site-packages/nose/*
*__init__*
*__init__*
7 changes: 3 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
<<<<<<< HEAD
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]

# pycharm generated
.idea/

# visual studio generated
bin/
obj/
Expand Down Expand Up @@ -62,8 +64,5 @@ docs/_build/

# PyBuilder
target/
||||||| merged common ancestors
=======
# Google App Engine generated folder
appengine-generated/
>>>>>>> adaaddecc50208c18b08806f63f80f3342bd5e30
9 changes: 8 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,18 @@ python:
- "3.4"
- "3.5"
sudo: false
# Enable 3.7 without globally enabling sudo and dist: xenial for other build jobs
matrix:
include:
- python: 3.7
dist: xenial
sudo: true
install:
- pip install python-coveralls
- pip install coveralls
- pip install cachetools
script: nosetests tests --verbose --with-coverage
after_success:
- coveralls
notifications:
email: false
email: false
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"python.linting.pylintEnabled": true
}
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2016 Hanif Amal Robbani

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# sastrawi
Indonesian stemmer. Python port of PHP Sastrawi project.

[![Coverage Status](https://coveralls.io/repos/har07/sastrawi/badge.svg?branch=development&service=github)](https://coveralls.io/github/har07/sastrawi?branch=development)
[![Coverage Status](https://coveralls.io/repos/har07/sastrawi/badge.svg?branch=development&service=github)](https://coveralls.io/github/har07/sastrawi?branch=development)
63 changes: 63 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
Sastrawi
========

| Sastrawi is a simple Python library which allows you to reduce
inflected words in Indonesian Language (Bahasa Indonesia) to their
base form (`stem`_).
| This is Python port of the original `Sastrawi`_ project written in
PHP.

|Build Status|
|Coverage Status|

Installation
------------

Sastrawi can be installed via `pip`_, by running the following commands
in terminal/command prompt : ``pip install Sastrawi``

Example Usage
-------------

Run the following commands in *Python interactive terminal* :

.. code:: python

# import Sastrawi package
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory

# create stemmer
factory = StemmerFactory()
stemmer = factory.create_stemmer()

# stem
sentence = 'Perekonomian Indonesia sedang dalam pertumbuhan yang membanggakan'
output = stemmer.stem(sentence)

print(output)
# ekonomi indonesia sedang dalam tumbuh yang bangga

print(stemmer.stem('Mereka meniru-nirukannya'))
# mereka tiru

Demo
---------

Live demo : https://pysastrawi-demo.appspot.com/

Repository : https://github.com/har07/pystastrawi-demo

More Info
---------

- `Sastrawi PHP Repository page`_

.. _stem: http://en.wikipedia.org/wiki/Stemming
.. _Sastrawi: https://github.com/sastrawi/sastrawi
.. _pip: https://docs.python.org/3.6/installing/index.html
.. _Sastrawi PHP Repository page: https://github.com/sastrawi/sastrawi

.. |Build Status| image:: https://travis-ci.org/har07/PySastrawi.svg?branch=master
:target: https://travis-ci.org/har07/PySastrawi
.. |Coverage Status| image:: https://coveralls.io/repos/har07/sastrawi/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/har07/sastrawi?branch=master
11 changes: 11 additions & 0 deletions Sastrawi.sln
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ Project("{888888A0-9F3D-457C-B088-3A5042F75D52}") = "Sastrawi", "src\Sastrawi\Sa
EndProject
Project("{888888A0-9F3D-457C-B088-3A5042F75D52}") = "SastrawiTest", "tests\SastrawiTest.pyproj", "{69199BE5-44C5-45C3-8B82-62F14DA2B9F1}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{0302964A-E17E-468E-8365-21827A654692}"
ProjectSection(SolutionItems) = preProject
.coveragerc = .coveragerc
.gitignore = .gitignore
.travis.yml = .travis.yml
README.md = README.md
README.rst = README.rst
setup.cfg = setup.cfg
setup.py = setup.py
EndProjectSection
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down
9 changes: 9 additions & 0 deletions release.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# generate disribution package
python -m pip install --user --upgrade setuptools wheel
python setup.py sdist bdist_wheel

# upload distribution package
python3 -m pip install --user --upgrade twine
twine upload --repository-url https://test.pypi.org/legacy/ dist/*

twine upload dist/*
5 changes: 5 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[bdist_wheel]
# This flag says that the code is written to work on both Python 2 and Python
# 3. If at all possible, it is good practice to do this. If you cannot, you
# will need to generate wheels for each Python version that you support.
universal=1
115 changes: 115 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
"""A setuptools based setup module.
See:
https://packaging.python.org/en/latest/distributing.html
https://github.com/pypa/sampleproject
"""

# To use a consistent encoding
from codecs import open
from os import path

# Always prefer setuptools over distutils
from setuptools import setup, find_packages

# Get the long description from the README file
here = path.abspath(path.dirname(__file__))
with open(path.join(here, 'README.rst'), encoding='utf-8') as f:
long_description = f.read()

setup(
name='PySastrawi',

# Versions should comply with PEP440. For a discussion on single-sourcing
# the version across setup.py and the project code, see
# https://packaging.python.org/en/latest/single_source_version.html
version='1.2.0',

description='Library for stemming Indonesian (Bahasa) text',
long_description='Library for stemming Indonesian (Bahasa) text',

# The project's main homepage.
url='https://github.com/har07/PySastrawi',

# Author details
author='Hanif Amal Robbani',
author_email='dev.har07@gmail.com',

# Choose your license
license='MIT',

# See https://pypi.python.org/pypi?%3Aaction=list_classifiers
classifiers=[
# How mature is this project? Common values are
# 3 - Alpha
# 4 - Beta
# 5 - Production/Stable
'Development Status :: 4 - Beta',

# Indicate who your project is intended for
'Intended Audience :: Information Technology',
'Intended Audience :: Science/Research',
'Topic :: Text Processing :: Linguistic',

# Pick your license as you wish (should match "license" above)
'License :: OSI Approved :: MIT License',

# Specify the Python versions you support here. In particular, ensure
# that you indicate whether you support Python 2, Python 3 or both.
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
],

# What does your project relate to?
keywords='linguistic stemming indonesian bahasa',

# You can just specify the packages manually here if your project is
# simple. Or you can use find_packages().
packages=find_packages('src', exclude=['contrib', 'docs', 'tests']),
# packages=["Sastrawi"],
package_dir={'': 'src'},

# Alternatively, if you want to distribute just a my_module.py, uncomment
# this:
# py_modules=["my_module"],

# List run-time dependencies here. These will be installed by pip when
# your project is installed. For an analysis of "install_requires" vs pip's
# requirements files see:
# https://packaging.python.org/en/latest/requirements.html
# install_requires=['peppercorn'],

# List additional groups of dependencies here (e.g. development
# dependencies). You can install these using the following syntax,
# for example:
# $ pip install -e .[dev,test]
# extras_require={
# 'dev': ['check-manifest'],
# 'test': ['coverage'],
# },

# If there are data files included in your packages that need to be
# installed, specify them here. If using Python 2.6 or less, then these
# have to be included in MANIFEST.in as well.
package_data={
'': ['data/*.txt'],
},

# Although 'package_data' is the preferred approach, in some case you may
# need to place data files outside of your packages. See:
# http://docs.python.org/3.4/distutils/setupscript.html#installing-additional-files # noqa
# In this case, 'data_file' will be installed into '<sys.prefix>/my_data'
# data_files=[('my_data', ['data/data_file'])],

# To provide executable scripts, use entry points in preference to the
# "scripts" keyword. Entry points provide cross-platform support and allow
# pip to create the appropriate form of executable for the target platform.
# entry_points={
# 'console_scripts': [
# 'sample=sample:main',
# ],
# },
)
23 changes: 12 additions & 11 deletions src/Sastrawi/Dictionary/ArrayDictionary.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
class ArrayDictionary(object):
from Sastrawi.Dictionary.DictionaryInterface import DictionaryInterface

class ArrayDictionary(DictionaryInterface):
"""description of class"""

def __init__(self, words=None):
self.words = []
if words:
if words is None:
self.words = {}
elif type(words) is dict:
self.words = words
elif type(words) is list:
self.add_words(words)
else:
self.words = {}

def contains(self, word):
return word in self.words
Expand All @@ -14,16 +21,10 @@ def count(self):

def add_words(self, words):
"""Add multiple words to the dictionary"""
for word in words:
self.add(word)
self.words = dict(zip(words,words))

def add(self, word):
"""Add a word to the dictionary"""
if not word or word.strip() == '':
return
self.words.append(word)





self.words[word] = word
14 changes: 9 additions & 5 deletions src/Sastrawi/Dictionary/DictionaryInterface.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
class DictionaryInterface(object):
"""description of class"""

def contains(self, word):
pass
# @update_by Mufid Jamaluddin
# @update_date 16/03/2019

from abc import ABCMeta, abstractmethod

class DictionaryInterface:
"""Interface definition of dictionary"""
__metaclass__ = ABCMeta

@abstractmethod
def contains(self, word):
pass
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import re


class DisambiguatorPrefixRule24(object):
"""Disambiguate Prefix Rule 24
Rule 24 : perCAerV -> per-CAerV where C != 'r'
"""

def disambiguate(self, word):
"""Disambiguate Prefix Rule 24
Rule 24 : perCAerV -> per-CAerV where C != 'r'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,19 @@

class DisambiguatorPrefixRule27(object):
"""Disambiguate Prefix Rule 27
Rule 27 : pen{c|d|j|z} -> pen-{c|d|j|z}
Rule 27 modified by Prasasto Adi : pen{c|d|j|s|t|z} -> pen-{c|d|j|s|t|z}
in order to stem penstabilan, pentranskripsi

Original CS Rule 27 was : pen{c|d|j|z} -> pen-{c|d|j|z}
"""

def disambiguate(self, word):
"""Disambiguate Prefix Rule 27
Rule 27 : pen{c|d|j|z} -> pen-{c|d|j|z}
Rule 27 modified by Prasasto Adi : pen{c|d|j|s|t|z} -> pen-{c|d|j|s|t|z}
in order to stem penstabilan, pentranskripsi

Original CS Rule 27 was : pen{c|d|j|z} -> pen-{c|d|j|z}
"""
matches = re.match(r'^pen([cdjz])(.*)$', word)
matches = re.match(r'^pen([cdjstz])(.*)$', word)
if matches:
return matches.group(1) + matches.group(2)
Loading