Installation | Documentation | Support | Examples | Samples | How to Contribute
Intel® oneAPI Data Analytics Library (oneDAL) is a powerful machine learning library that helps speed up big data analysis. oneDAL solvers are also used in Intel Distribution for Python for scikit-learn optimization.
Intel® oneAPI Data Analytics Library is an extension of Intel® Data Analytics Acceleration Library (Intel® DAAL).
- Python API
- oneDAL Apache Spark MLlib samples
- Installation
- Documentation
- Support
- Technical Preview Features
- oneDAL and Intel® DAAL
oneDAL uses all capabilities of Intel® hardware, which allows you to get a significant performance boost for the classic machine learning algorithms.
We provide highly optimized algorithmic building blocks for all stages of data analytics: preprocessing, transformation, analysis, modeling, validation, and decision making.
oneDAL also provides Data Parallel C++ (DPC++) API extensions to the traditional C++ interfaces.
The size of the data is growing exponentially as does the need for high-performance and scalable frameworks to analyze all this data and benefit from it. Besides superior performance on a single node, oneDAL also provides distributed computation mode that shows excellent results for strong and weak scaling:
oneDAL K-Means fit, strong scaling result | oneDAL K-Means fit, weak scaling results |
---|---|
Technical details: FPType: float32; HW: Intel Xeon Processor E5-2698 v3 @2.3GHz, 2 sockets, 16 cores per socket; SW: Intel® DAAL (2019.3), MPI4Py (3.0.0), Intel® Distribution Of Python (IDP) 3.6.8; Details available in the article https://arxiv.org/abs/1909.11822
Refer to our examples and documentation for more information about our API.
oneDAL has a Python API that is provided as a standalone Python library called daal4py.
The example below shows how daal4py can be used to calculate K-Means clusters:
import numpy as np
import pandas as pd
import daal4py as d4p
data = pd.read_csv("local_kmeans_data.csv", dtype = np.float32)
init_alg = d4p.kmeans_init(nClusters = 10,
fptype = "float",
method = "randomDense")
centroids = init_alg.compute(data).centroids
alg = d4p.kmeans(nClusters = 10, maxIterations = 50, fptype = "float",
accuracyThreshold = 0, assignFlag = False)
result = alg.compute(data, centroids)
Data scientists often require different tools for analysis of regular and big data. daal4py offers various processing models, which makes it easy to enable distributed multi-node mode.
import numpy as np
import pandas as pd
import daal4py as d4p
d4p.daalinit() # <-- Initialize SPMD mode
data = pd.read_csv("local_kmeans_data.csv", dtype = np.float32)
init_alg = d4p.kmeans_init(nClusters = 10,
fptype = "float",
method = "randomDense",
distributed = True) # <-- change model to distributed
centroids = init_alg.compute(data).centroids
alg = d4p.kmeans(nClusters = 10, maxIterations = 50, fptype = "float",
accuracyThreshold = 0, assignFlag = False,
distributed = True) # <-- change model to distributed
result = alg.compute(data, centroids)
For more details browse daal4py documentation.
You can speed up Scikit-learn using Intel(R) Extension for Scikit-learn*.
Intel(R) Extension for Scikit-learn* speeds up scikit-learn beyond by providing drop-in patching. Acceleration is achieved through the use of the Intel(R) oneAPI Data Analytics Library that allows for fast usage of the framework suited for Data Scientists or Machine Learning users.
Intel(R) Extension for Scikit-learn* provides an option to replace some scikit-learn methods by oneDAL solvers, which makes it possible to get a performance gain without any code changes. You can patch the stock scikit-learn by using the following command-line flag:
python -m sklearnex my_application.py
Patches can also be enabled programmatically:
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from time import time
svm_sklearn = SVC(kernel="rbf", gamma="scale", C=0.5)
digits = load_digits()
X, y = digits.data, digits.target
start = time()
svm_sklearn = svm_sklearn.fit(X, y)
end = time()
print(end - start) # output: 0.141261...
print(svm_sklearn.score(X, y)) # output: 0.9905397885364496
from sklearnex import patch_sklearn
patch_sklearn() # <-- apply patch
from sklearn.svm import SVC
svm_sklearnex = SVC(kernel="rbf", gamma="scale", C=0.5)
start = time()
svm_sklearnex = svm_sklearnex.fit(X, y)
end = time()
print(end - start) # output: 0.032536...
print(svm_sklearnex.score(X, y)) # output: 0.9905397885364496
For more details browse Intel(R) Extension for Scikit-learn* documentation.
oneDAL provides Scala and Java interfaces that match Apache Spark MlLib API and use oneDAL solvers under the hood. This implementation allows you to get a 3-18X increase in performance compared to the default Apache Spark MLlib.
Technical details: FPType: double; HW: 7 x m5.2xlarge AWS instances; SW: Intel DAAL 2020 Gold, Apache Spark 2.4.4, emr-5.27.0; Spark config num executors 12, executor cores 8, executor memory 19GB, task cpus 8
Check the samples tab for more details.
You can install oneDAL:
- from oneDAL home page as a part of Intel® oneAPI Base Toolkit.
- from GitHub*.
See Installation from Sources for details.
Beside C++ and Python API, oneDAL also provides APIs for DPC++ and Java:
- System Requirements
- Get Started Guide
- Developer Guide and Reference
- daal4py documentation
- Intel(R) Extension for Scikit-learn* documentation
- Specification
- Release Notes
- Known Issues
Refer to GitHub Wiki to browse the full list of oneDAL and daal4py resources.
Ask questions and engage in discussions with oneDAL developers, contributers, and other users through the following channels:
You may reach out to project maintainers privately at onedal.maintainers@intel.com.
To report a vulnerability, refer to Intel vulnerability reporting policy.
Report issues and make feature requests using GitHub Issues.
We welcome community contributions, so check our contributing guidelines to learn more.
Use GitHub Wiki to provide feedback about oneDAL.
Samples are examples of how oneDAL can be used in different applications:
Technical preview features are introduced to gain early feedback from developers. A technical preview feature is subject to change in the future releases. Using a technical preview feature in a production code base is therefore strongly discouraged.
In C++ APIs, technical preview features are located in daal::preview
and oneapi::dal::preview
namespaces. In Java APIs, technical preview features are located in packages that have the com.intel.daal.preview
name prefix.
The preview features list:
- Graph Analytics:
- Undirected graph without edge and vertex weights (
undirected_adjacency_vector_graph
), where vertex indices can only be of type int32 - Directed graph with and without edge weights (
directed_adjacency_vector_graph
), where vertex indices can only be of type int32, edge weights can be of type int32 or double- Jaccard Similarity Coefficients for all pairs of vertices, a batch algorithm that processes the graph by blocks
- Local and Global Triangle Counting
- Single Source Shortest Paths (SSSP)
- Subgraph isomorphism algorithm for induced and non-induced subgraphs in undirected graphs (integer vertex attributes are supported, edge attributes are not supported).
- Undirected graph without edge and vertex weights (
Intel® oneAPI Data Analytics Library is an extension of Intel® Data Analytics Acceleration Library (Intel® DAAL).
This repository contains branches corresponding to both oneAPI and classical versions of the library. We encourage you to use oneDAL located under the master
branch.
Product | Latest release | Branch |
---|---|---|
oneDAL | 2021.3 | master rls/2021.3-rls |
Intel® DAAL | 2020 Update 3 | rls/daal-2020-u3-rls |
oneDAL is distributed under the Apache License 2.0 license. See LICENSE for more information.
oneMKL FPK microlibs are distributed under Intel Simplified Software License. Refer to third-party-programs-mkl.txt for details.