Prototype Based Clustering Analysis on seeds dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/seeds
Broadly, the following steps have been performed in this solution notebook:

Minimal preprocessing on the dataset
Explained limitations of KMeans
Suggested two existing algorithms (KMedoids and CLARANS) that use some technique to mitigate limitations of KMeans
Visualization of given class labels using TSNE
Ran KMedoids and CLARANS on the seeds dataset and reported the best results obtained on various cluster validity indices.
- Further compared the results with KMeans.
Reported and visualized the hyperparameter tuning for KMedoids and CLARANS required to achieve the best results obtained on the seeds dataset

These above assumptions and the flow of work is according to the questions asked in assignment.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
prototype_clustering.ipynb		prototype_clustering.ipynb
seeds.csv		seeds.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prototype Based Clustering Analysis on seeds dataset

About

Releases

Packages

Languages

havelhakimi/seeds

Folders and files

Latest commit

History

Repository files navigation

Prototype Based Clustering Analysis on seeds dataset

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages