This is the list of projects related to mathematics, numeric computation, statistics, data science, or machine learning that we wish to see implemented and integrated with PolyMath or used together with it. Some of these projects will be proposed for Google Summer of Code 2019. If you have any ideas - feel free to add them to the list. If you are a student who wants to take on a project from this list and need assistance, please contact us on #polymath channel of Pharo Discord or writing a letter to PolyMath mailing list.
- Native vector-matrix algebra
- DataFrame
- NLP library
- Data visualizations
- Keras-style API on top of TensorFlow
- PolyMath examples
- Support for Vega visualization engine
- Improve ODE-solver
- Rule-based integration
- Computational algebra
Build a numeric library for matrix and vector algebra (similar to PolyMath but smaller) with external backend such as BLAS, LAPACK, or Intel MKL. These are native libraries written in Fortran or C that provide routines for fast vector and matrix operations. All widely used numeric libraries and languages such as numpy, R, Matlab make calls to these routines. PolyMath on the other hand has everything implemented in Smalltalk, which has many advantages, but it is so slow that in practice it's impossible to use PolyMath vector-matrix operations for most real applications (for example, in machine learning). There is an existing library for Squeak to use LAPACK.
PolyMath DataFrame is a Smalltalk library similar to pandas in Python or data.frames in R. It implements data structures for processing and analysing tabular data which is an essential part of data science and machine learning workflow. DataFrame needs a lot of improvement, both fixing the existing features and adding new ones.
We need a natural language processing (NLP) library entirely written in Pharo with functionality similar to NLTK or Spacy: part of speech (PoS) tagging, named entity recognition (NER), lemmatization, stemming, word sense disambiguation, tf-idf, n-grams, various metrics etc. There is an existing NLP library that implements part of the mentioned features.
We need a simple and powerful data visualization (charting) library similar to ggplot and based on The Grammar of Graphics. Inspiration can also be taken from Python's matplotlib and seaborn but they do not follow the Grammar (which is very preferable). The library must be entirely implemented in Pharo, elements of visualization must be objects that allow interaction and can be inspected. This can be done by using Telescope's Geometry library as back-end. Existing libraries for data visualization in Pharo include Roassal's charting functionalities and a bridge to Python's matplotlib, however none of them fully satisfy the conditions listed above.
PolyMathOrg provides bindings to use TensorFlow inside Pharo. We need to design a high-level API similar to Keras that would be easy to work with and allow fast experimentation. There is an existing bridge that calls Keras functions from Python. We need to have similar API with Pharo TensorFlow as a backend.
More PolyMath examples, have more integration with Roassal and/or GToolkit.
Add support for Vega visualization engine.
PolyMath has a simple ODE-solver. We would like to have more elaborate ways of solving ODE like the ones in Julia.
Implement rule-based integration such as Rubi.
Add support for GAP - Groups, Algorithms, Programming - a System for Computational Discrete Algebra computational algebra. There is an existing implementation in Cuis Smalltalk.