The data consists of 2 feature columns. Dataset.
- Upload the data. Construct a graphical representation of the experimental data (scatter plot). Visually assess the number of clusters, k, based on the constructed representation.
- Develop the k-means clustering algorithm and implement it programmatically in MATLAB.
- Perform cluster analysis on the original data using the k-means method (see method parameters in Table 5.2). Determine the most optimal number of clusters, k.
- Calculate the centroids of the obtained clusters. Visualize the found clusters graphically (utilize a colored scatter plot).
- The dataset was obtained from a txt file and converted into a pd.Dataframe for greater convenience.
- A scatter plot was constructed for the original dataset.
- The elbow method was applied to determine the optimal number of clusters. In this task, 4 clusters were identified.
- The k-means method was implemented from scratch with Euclidean distance metric and intra-cluster sum of distances as the clustering quality metric.
- Plots were generated for each step of the k-means algorithm.
- A joint plot was created to visualize the final distribution of data across clusters.