This repository represents an academic workshop of data mining
course. It contains a practical assignment to get in depth with both supervised and unsupervised learning.
The objectives learnt are :
- Visualizing the dataset
- Using
naive bayes
model and learning its prinicples - Implementing a method that splits dataset into training and test datasets ( A manual implementation of sklearn
train_test_split
function ) - Training the model using different training dataset size
- Calculating errors and scores in each case
- Cross validation
- Using
Random Forest
model
You can find the notebook here : https://github.com/BenrhayemRacem/GL4_TP_DATA_MINING/tree/supervised_learning
The objectives learnt are :
- Visualizing the dataset
- Using
kmeans
model and learning its prinicples - Calculating the
silhouette score
- Drawing the
dendrogram
with hierarchical agglomerative clustering algorithm (HAC) - Using the Principal Component Analysis (PCA)
- Using an Agglomerative Clustering (AGNES) and drawing its dendrogram
- Comparing HAC and Agglomerative Clustering results with the kmeans using crosstab
- Implementing a manual DIANA ( DIvisie ANAlysis) approach based on kmeans
You can find the notebook here : https://github.com/BenrhayemRacem/GL4_TP_DATA_MINING/tree/unsupervised_learning