Skip to content

Commit

Permalink
Merge branch 'release/0.0.2'
Browse files Browse the repository at this point in the history
  • Loading branch information
giacbrd committed Oct 13, 2016
2 parents 0cfeec7 + 369e218 commit 14655d7
Show file tree
Hide file tree
Showing 15 changed files with 12,684 additions and 84 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -98,4 +98,4 @@ Thumbs.db
# Others
.idea
temp/
.pypirc
.pypirc
6 changes: 0 additions & 6 deletions .pypirc

This file was deleted.

5 changes: 4 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,8 @@ before_install:
- conda update --yes conda
install:
- conda install --yes python=$TRAVIS_PYTHON_VERSION numpy scipy
- pip install cython
- python setup.py install
script: python setup.py test
script:
- python setup.py test
#FIXME add a script like "python scripts/document_classification_20newsgroups.py" without plotting
13 changes: 13 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Changelog
=========

`0.0.2 <https://github.com/giacbrd/ShallowLearn/releases/tag/0.0.2>`_ (2016-14-10)
----------------------------------------------------------------------------------

* Cython code for fastText in Gensim
* Script for benchmarks

`0.0.1 <https://github.com/giacbrd/ShallowLearn/releases/tag/0.0.1>`_ (2016-11-10)
----------------------------------------------------------------------------------

* First working model: GensimFastText
53 changes: 34 additions & 19 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,46 +2,61 @@ ShallowLearn
============
A collection of supervised learning models based on shallow neural network approaches (e.g., word2vec and fastText)
with some additional exclusive features.
They are written in Python and fully compatible with `Scikit-learn <http://scikit-learn.org>`_
Written in Python and fully compatible with `Scikit-learn <http://scikit-learn.org>`_.

.. image:: https://travis-ci.org/giacbrd/ShallowLearn.svg?branch=master
:target: https://travis-ci.org/giacbrd/ShallowLearn
.. image:: https://badge.fury.io/py/shallowlearn.svg
:target: https://badge.fury.io/py/shallowlearn

Installation
------------
``pip install shallowlearn``
Getting Started
---------------
Install the latest version:

.. code:: shell
pip install shallowlearn
Import models from ``shallowlearn.models``, they implement the standard methods for supervised learning in Scikit-learn,
e.g., ``fit(X, y)``, ``predict(X)``, etc.

Data is raw text, each sample is a list of tokens (words of a document), while each target value in ``y`` can be a
single label (or a list in case of multi-label training set) associated with the relative sample.

Models
------
``shallowlearn.models.GensimFTClassifier``
``shallowlearn.models.GensimFastText``
A supervised learning model based on the fastText algorithm [1]_.
The code is mostly taken and rewritten from `Gensim <https://radimrehurek.com/gensim>`_,
it takes advantage of its optimizations and support.
**TODO**: Cython code

``shallowlearn.models.FastTextClassifier``
``shallowlearn.models.FastText``
**TODO**: The supervised algorithm of fastText implemented in https://github.com/salestock/fastText.py

``shallowlearn.models.DeepIRClassifier``
``shallowlearn.models.DeepInverseRegression``
**TODO**: Based on https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec.score

Exclusive Features
------------------
**TODO**

Performances
------------
**TODO**: Comparison with other classifiers in effectiveness and computation cost

TODO
----

- Tests!
- Documents can be structured, made of different sections, learned independently
- Taking into account https://github.com/RaRe-Technologies/gensim/pull/847, implementing the hashing trick
- Given the previous point, implementing n-grams of words
Benchmarks
----------
The script ``scripts/document_classification_20newsgroups.py`` refers to this
`Scikit-learn example <http://scikit-learn.org/stable/auto_examples/text/document_classification_20newsgroups.html>`_
in which text classifiers are compared on a reference dataset;
we added our models to the comparison.
**The current results, even if still preliminary, are comparable with other
approaches, achieving the best performance in speed**.

Results as of release `0.0.2 <https://github.com/giacbrd/ShallowLearn/releases/tag/0.0.2>`_,
with *chi2_select* option set to 80%.
The times take into account of *tf-idf* vectorization in the “classic” classifiers;
the evaluation measure is *macro F1*.

.. image:: https://cdn.rawgit.com/giacbrd/ShallowLearn/develop/benchmark.svg
:alt: Text classifiers comparison
:align: center

References
----------
Expand Down
Loading

0 comments on commit 14655d7

Please sign in to comment.