Merge branch 'release/0.0.2'

giacbrd · Oct 13, 2016 · 14655d7 · 14655d7
2 parents 0cfeec7 + 369e218
commit 14655d7
Show file tree

Hide file tree

Showing 15 changed files with 12,684 additions and 84 deletions.
diff --git a/.gitignore b/.gitignore
@@ -98,4 +98,4 @@ Thumbs.db
 # Others
 .idea
 temp/
-.pypirc
+.pypirc
diff --git a/.pypirc b/.pypirc
diff --git a/.travis.yml b/.travis.yml
@@ -18,5 +18,8 @@ before_install:
   - conda update --yes conda
 install:
   - conda install --yes python=$TRAVIS_PYTHON_VERSION numpy scipy
+  - pip install cython
   - python setup.py install
-script: python setup.py test
+script:
+  - python setup.py test
+  #FIXME add a script like "python scripts/document_classification_20newsgroups.py" without plotting
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -0,0 +1,13 @@
+Changelog
+=========
+
+`0.0.2 <https://github.com/giacbrd/ShallowLearn/releases/tag/0.0.2>`_ (2016-14-10)
+----------------------------------------------------------------------------------
+
+* Cython code for fastText in Gensim
+* Script for benchmarks
+
+`0.0.1 <https://github.com/giacbrd/ShallowLearn/releases/tag/0.0.1>`_ (2016-11-10)
+----------------------------------------------------------------------------------
+
+* First working model: GensimFastText
diff --git a/README.rst b/README.rst
@@ -2,46 +2,61 @@ ShallowLearn
 ============
 A collection of supervised learning models based on shallow neural network approaches (e.g., word2vec and fastText)
 with some additional exclusive features.
-They are written in Python and fully compatible with `Scikit-learn <http://scikit-learn.org>`_
+Written in Python and fully compatible with `Scikit-learn <http://scikit-learn.org>`_.
 
 .. image:: https://travis-ci.org/giacbrd/ShallowLearn.svg?branch=master
     :target: https://travis-ci.org/giacbrd/ShallowLearn
 .. image:: https://badge.fury.io/py/shallowlearn.svg
     :target: https://badge.fury.io/py/shallowlearn
 
-Installation
-------------
-``pip install shallowlearn``
+Getting Started
+---------------
+Install the latest version:
+
+.. code:: shell
+
+    pip install shallowlearn
+
+Import models from ``shallowlearn.models``, they implement the standard methods for supervised learning in Scikit-learn,
+e.g., ``fit(X, y)``, ``predict(X)``, etc.
+
+Data is raw text, each sample is a list of tokens (words of a document), while each target value in ``y`` can be a
+single label (or a list in case of multi-label training set) associated with the relative sample.
 
 Models
 ------
-``shallowlearn.models.GensimFTClassifier``
+``shallowlearn.models.GensimFastText``
     A supervised learning model based on the fastText algorithm [1]_.
     The code is mostly taken and rewritten from `Gensim <https://radimrehurek.com/gensim>`_,
     it takes advantage of its optimizations and support.
-    **TODO**: Cython code
 
-``shallowlearn.models.FastTextClassifier``
+``shallowlearn.models.FastText``
     **TODO**: The supervised algorithm of fastText implemented in https://github.com/salestock/fastText.py
 
-``shallowlearn.models.DeepIRClassifier``
+``shallowlearn.models.DeepInverseRegression``
     **TODO**: Based on https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec.score
 
 Exclusive Features
 ------------------
 **TODO**
 
-Performances
-------------
-**TODO**:  Comparison with other classifiers in effectiveness and computation cost
-
-TODO
-----
-
-- Tests!
-- Documents can be structured, made of different sections, learned independently
-- Taking into account https://github.com/RaRe-Technologies/gensim/pull/847, implementing the hashing trick
-- Given the previous point, implementing n-grams of words
+Benchmarks
+----------
+The script ``scripts/document_classification_20newsgroups.py`` refers to this
+`Scikit-learn example <http://scikit-learn.org/stable/auto_examples/text/document_classification_20newsgroups.html>`_
+in which text classifiers are compared on a reference dataset;
+we added our models to the comparison.
+**The current results, even if still preliminary, are comparable with other
+approaches, achieving the best performance in speed**.
+
+Results as of release `0.0.2 <https://github.com/giacbrd/ShallowLearn/releases/tag/0.0.2>`_,
+with *chi2_select* option set to 80%.
+The times take into account of *tf-idf* vectorization in the “classic” classifiers;
+the evaluation measure is *macro F1*.
+
+.. image:: https://cdn.rawgit.com/giacbrd/ShallowLearn/develop/benchmark.svg
+    :alt: Text classifiers comparison
+    :align: center
 
 References
 ----------