Skip to content

Latest commit

 

History

History
38 lines (19 loc) · 1.6 KB

README.md

File metadata and controls

38 lines (19 loc) · 1.6 KB

Unsupervised_Country_Data

Dataset provided by HELP International. The objective is to categorize countries according to the overall development using socio-economic and health factors.

Requirements:

pandas: Data analysis and manipulation tool.

matplotlib: Visualization library.

seaborn: Data visualization library based on matplotlib, it enhances the style of matplotlib plots.

Numpy: Numerical analysis library.

scikit-learn: Machine Learning library.

Bokeh: Library for interactive data visualization.

Plotly Express: High-level Python visualization library.

First part - EDA and Unsupervised Analysis:

After a brief exploratory data analysis, several unsupervised algorithms such as Kmeans, Affinity Propagation and Gaussian Mixture Model are used to group countries into three categories.

Second part - Dimension reduction with t-SNE and Maps visualizations:

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. Data is reduced in two dimensions using t-SNE and plotted with Bokeh.

tSNE2

Interactive map visualizations are used to show the result of the previous analysis.

world_plotly

asia_gdpp_mort