Here we will discuss the Supervised/Unsupervised Machine Learning Models
-
Here I used two diffrent approaches to find the parameters of Linear Regression Model. One using the Gradient Descent approach of minimizing the Least Square loss; other is Moore-Penrose pseudoinverse Method. Instead of using Ordinary Least square method Moore-Penrose's method of using Pseudo-Inverse will give us the parameters of the linear regression model irrespective of "rank" of the matrix.
-
Also worked on Polynomial Regression by expanding the feature space in the desired order, along with the XOR data. Found out the parameters of the XOR_model by expanding the feature space in more dimensions, which cannot be linearly seperable in two dimensional space.
Results: Plot generated by the poly_normal() is more accurate as it has also captured thquadratic relation between the x and y, which is more accurate as the true y does have some quadratic relation wrt to data observations. Since we have used linear and quadratic terms in the poly_normal function as feature space the model has learned the true relation more accurately. poly_normal() approximates the data in better way than linear_normal() function.
Contour plot of poly_normal() is as follows: Here all the points are properly classified. (-1,1) and (1,-1) are classified as blue and(-1,-1) and (1,1) are classified differently with a good range of differentiation between the classes. These are linearly separable in three dimensional vector space.Only poly_normal() does classify points correctly, as it has a more flexible feature space which also includes the quadratic features which better helps in classifying the above data points, as the data points are not linearly separable in two dimensional vector space
- Modeled SVM (soft when c is between 0 to inf /hard when c = inf) by training the lagrangian equation using dual variable rather than using Hinge loss. The reasons are as follows: A) Computational Efficiency: The dual formulation leads to a convex quadratic programming problem, which can be solved using efficient optimization algorithms B) Kernel Trick: Kernalization can be directly applied in case of dual formualtion C) Support Vector Identification: Solving the dual formulation gives the values of ai, this value determines the importance of i-th obesravation in forming the decision boundary.
Experiment: Considering the Learning rate as 0.1 and Gradient Descent step-size as 10000, performed the following kernel SVMs, trained on the XOR data (present in the code) Results:
What Problem does it solve ? : Finding Natural patterns and similar groups of data when ground truths are not known. Where Can I potentially use this? : Customer Segmentation, Anomaly Detection, Recommendation Systems, Clustering
- Most commonly used for Exploratory Data Analysis when we encounter any new datafile. Also it's Unsupervised learning model which further more makes this algorithm to use more often. It's a iterative algorithm that works by partitioning the data points into k many distinct non-overlapping groups by minimizing the intercluster variation.
- Convergence guaranteed: Yes, it’s guaranteed to converge. K-means works by minimising the within cluster variation. Upon every iteration K-means find the cluster’s centre in such a way that the with-in cluster variation is minimised. So, upon a certain number of steps it finds the cluster centres that don’t change after new iteration irrespective of whether it belongs to global minima or local minima.
- Global Optima Always: No, We can’t always find global optima using the K-means algorithm. Since we areusing alternate Optimization where we optimise rxk keeping µk values as constant andvice-versa. It is minimising the within cluster variation. Depends on the initial clustercentres how the end clustering looks like. Cluster centres once formed, unable tomove within the clusters so very sensitive to initialization and we need to initialisewith different cluster centres and pick the one with least cost function.
- Upon generating a random data-set and using my model to predict the output; here are the results,