Skip to content
Hollin Wilkins edited this page Jan 13, 2017 · 55 revisions

MLeap For Spark

MLeap deploys Spark ML (and some MLlib) transformers and pipelines to production without a Spark Context.

MLeap For Scikit-Learn

MLeap extends scikit-learn's functionality to be able to serialize and deploy scikit transformers, pipelines and feature unions without any dependencies on scikit (numpy, scipy, c++ libraries). It also serializes transformers and pipelines as Spark, so you can load and deploy your scikit pipelines on Spark infrastructure with a few lines of code.

Tutorials

Demos

Supported Transformers

Features

Transformer Spark MLeap Scikit-Learn TensorFlow
Binarizer x x x
BucketedRandomProjectionLSH x x
Bucketizer x x
ChiSqSelector x x
CountVectorizer x x
DCT x x
ElementwiseProduct x x x
HashingTermFrequency x x x
IDF x x
Imputer x x x
Interaction x x x
MaxAbsScaler x x
MinHashLSH x x
MinMaxScaler x x x
Ngram x x
Normalizer x x
OneHotEncoder x x
PCA x x x
QuantileDiscretizer x x
PolynomialExpansion x x x
ReverseStringIndexer x x x
StandardScaler x x x
StopWordsRemover x x
StringIndexer x x x
Tokenizer x x x
VectorAssembler x x x
VectorIndexer x x
VectorSlicer x x
WordToVector x x

Classification

| Transformer | Spark| MLeap | Scikit-Learn | TensorFlow | | ------------- |:-------------:| -----:| -----:| | DecisionTreeClassifier | x | x | x | | | GradientBoostedTreeClassifier | x | x | | | | LogisticRegression | x | x | x | | | LogisticRegressionCv | x | x | x | | | NaiveBayesClassifier | x | x | | | | OneVsRest | x | x | | | | RandomForestClassifier | x | x | x | | | SupportVectorMachines | x | x | x | | | MultiLayerPerceptron | x | x | | |

Regression

| Transformer | Spark | MLeap | Scikit-Learn | TensorFlow | | ------------- |:-------------:| -----:| -----:| | AFTSurvivalRegression | x | x | | | | DecisionTreeRegression | x | x | x | | | GeneralizedLinearRegression | x | x | | | | GradientBoostedTreeRegression | x | x | | | | IsotonicRegression | x | x | | | | LinearRegression | x | x | x | | | RandomForestRegression | x | x | x | |

Clustering

Transformer Spark MLeap Scikit-Learn TensorFlow
BisectingKMeans x x
GaussianMixtureModel x x
KMeans x x
LDA x

Extensions

Transformer Spark MLeap Scikit-Learn TensorFlow Description
MathUnary x x x Simple set of unary mathematical operations
MathBinary x x x Simple set of binary mathematical operations

Recommendation

Transformer Spark MLeap Scikit-Learn TensorFlow
ALS x