Skip to content

jnikhilreddy/Visualize-PCA-Auto-encoder-And-K-Means-Clustering

Repository files navigation

Visualize-PCA-Auto-encoder-And-K-Means-Clustering

code for Visualizing and Understanding the Relationship between PCA, Auto encoder and K-Means Clustering.

Principal Component Analysis (PCA) is a widely used technique in the area of Unsupervised Dimensionality Reduction. In Unsupervised data Clustering, one of the popular technique is K-means clustering. C Ding et al. proved that K-means Clustering can be approximated as a super-sparse PCA. Authors also proved that that the relaxed solution of K-means Clustering, specified by the Cluster Indicators, is given by Principal Component Analysis (PCA). Although PCA is not a Clustering method, it is generally used to reveal Clusters. In General, both methods, PCA and K-means Clustering are used together. This is because in case of higher Dimension data, PCA helps in reducing the Dimension of data on which we can apply K-means Clustering to reduce Computation cost. In a nutshell, we aim to establish the Relationship between PCA and K-means Clustering along with needed proofs. Later we Visualise graphs, for this established Relationship on IMDB Movie dataset. Later we extend to Understand relationship between PCA and auto-encoder i.e.., under what constraints PCA is equivalent to Auto encoder using IRIS dataset.

Code is available in corresponding Jupyter notebook files.