Prince uses pandas to manipulate dataframes, as such it expects an initial dataframe to work with. In the following example, a Principal Component Analysis (PCA) is applied to the iris dataset. Under the hood Prince decomposes the dataframe into two eigenvector matrices and one eigenvalue array thanks to a Singular Value Decomposition (SVD). The eigenvectors can then be used to project the initial dataset onto lower dimensions.
import matplotlib.pyplot as plt
import pandas as pd
import prince
df = pd.read_csv('data/iris.csv')
pca = prince.PCA(df, n_components=4)
fig1, ax1 = pca.plot_cumulative_inertia()
fig2, ax2 = pca.plot_rows(color_by='class', ellipse_fill=True)
plt.show()
The first plot displays the rows in the initial dataset projected on to the two first right eigenvectors (the obtained projections are called principal coordinates). The ellipses are 90% confidence intervals.
The second plot displays the cumulative contributions of each eigenvector (by looking at the corresponding eigenvalues). In this case the total contribution is above 95% while only considering the two first eigenvectors.
Prince is only compatible with Python 3. Although it isn't a requirement, using Anaconda is recommended as it is generally a good idea for doing data science in Python.
Via PyPI
>>> pip install prince
Via GitHub for the latest development version
>>> pip install git+https://github.com/MaxHalford/Prince
Prince has the following dependencies:
- pandas for manipulating dataframes
- matplotlib as a default plotting backend
- fbpca, Facebook's randomized SVD implementation
Please check out the documentation for a list of available methods and properties.
You can examples in the examples/
folder, you have to navigate to the folder to use them.
>>> cd examples/
>>> python pca-iris.py
The MIT License (MIT). Please see the license file for more information.