Hierarchical agglomerative clustering on female fragrance accords
Unsupervised machine learning project with hierarchical agglomerative clustering performed on 39.7K female fragrances.
This project is part of my fragrance exploration series:
- K-means++ clustering on fragrance accords
https://github.com/katarzynajanicka/fragrance-clustering - Agglomerative hierarchical clustering on 39.7K female fragrances
https://github.com/katarzynajanicka/agglomerative-fragrance-clustering - Accords-based recommendation system for female fragrances
https://github.com/katarzynajanicka/fragrance-finder
Project is created with Python - version: 3.8.2.
Python libraries:
- scipy - version 1.5.2
- scikit-learn - version 0.23.2
- pandas - version 1.1.1
- numpy - version 1.19.2
- matplotlib - version 3.3.1
- seaborn - version 0.11.0
Input data: result.csv, this is the end result of the https://github.com/katarzynajanicka/fragrance-clustering project.
Output data:
- hierarchical-clustering.ipynb (Jupyter notebook)
- hierarchical_result.csv (end result)
Project structure
Data structure
There are 39.7K rows. Each observation is a unique female fragrance.
Fields:
- brand - name of the brand
- title - name of the fragrance
- date - release date (in YYYY format)
- rating_score - fragrance rating
- votes - number of votes cast for a scent
- accords - top five notes
Dendrograms
Hierarchical clustering
Cluster description by top accords
Fragrance tree
Most popular fragrances
Most popular fragrances by cluster
Most popular fragrances by brand
Final thoughts
Agglomerative hierarchical clustering technique turned out be a better approach than K-means++ clustering (see: https://github.com/katarzynajanicka/fragrance-clustering). This is due to the fact that different perfume fragrances usually share the same notes. It is not unusual for a fragrance to have accords from two or three fragrance families (Floral, Fresh, Woody, Oriental).
Project is finished.