An interactive learning web app to review the data pipeline process and understand the models used to classify a K-pop's song popularity given its features such as audio properties and artist name.
Access the webpage here!
Outlines my motivations for undertaking this topic, the data pipeline process, performance results of all 5 classification models used.
Outlines interactive Plotly visualisations I used in my EDA Notebook
Experiment with the parameters of the selected SVM model and explore the various performance metric visualisations (Confusion Matrix, ROC-AUC curve, Precision-Recall curve)
Experiment with the input features and get the result on whether given song is popular or not.
- Logistic Regression
- Random Forest
- SVM ✅
- XGBoost
- KNN
- Collect a larger dataset of Kpop songs over time to include more data from artists labelled as 'Others'.
- Consider data cleaning techniques such as removing potential outliers in dataset (more preferred if dataset was large enough) or feature scaling to optimise performance of machine learning algorithms
Building Interactive plots on Streamlit: Misra Turp's Streamlit Videos Playlist