GitHub - Shanmukhi1920/Song-Genre-Classification: An Interactive Web Application to classify song genres as either "HipHop" or "Rock" based on audio features.

Overview

This project focuses on classifying songs as either Hip-Hop or Rock based on audio features like danceability, energy, and tempo using machine learning techniques. The goal is to accurately predict a song's genre, demonstrating how AI can be applied to music analysis. This work has potential applications in the music industry, including automating playlist creation, enhancing recommendation systems, and assisting in music production. It also serves as a valuable tool for music researchers studying genre characteristics.

Dataset

The analysis is based on two datasets:

data/echonest-metrics.json: contains track metadata with genre labels
data/fma-rock-vs-hiphop.csv: contains track metrics with the features

Methodology

1. Data Collection and Preprocessing

Used a dataset compiled by The Echo Nest, containing audio features of songs classified as either 'Hip-Hop' or 'Rock'.
Features included acousticness, danceability, energy, instrumentalness, liveness, speechiness, tempo, and valence.
Merged track metadata with audio features using track IDs.
Handled class imbalance by undersampling the majority class ('Rock') to match the number of 'Hip-Hop' samples.

2. Exploratory Data Analysis (EDA)

Visualized the distribution of each audio feature using histograms.
Created box plots to compare feature distributions between genres.
Computed correlation matrices using both Pearson and Spearman methods to assess feature relationships.

3. Feature Engineering and Selection

Applied StandardScaler to normalize all features (mean=0, std=1).
Performed Principal Component Analysis (PCA) for dimensionality reduction.
Analyzed scree plot and cumulative explained variance to determine the optimal number of components.
Selected 6 principal components, explaining approximately 85% of the variance.

4. Model Development and Evaluation

Implemented four classification models: Logistic Regression, Support Vector Machine (SVM), Decision Tree, and Random Forest.
Used 5-fold cross-validation to assess initial model performance.
Performed GridSearchCV for hyperparameter tuning of Decision Tree and Random Forest models.
Evaluated the best models on the test set using accuracy, precision, recall, and F1-score.

Installation and Usage

To set up the project, follow these steps

Clone the repository

git clone https://github.com/Shanmukhi1920/Song_Genre_Classification

Navigate the project directory

cd Song_Genre_Classification

Install dependencies

pip install -r requirements.txt

Run Jupyter Notebook to view the project

jupyter notebook

In Jupyter, open the Song_Genre_Classification.ipynb notebook in the notebooks/ directory to view the full analysis.

Web Application

The src/app.py file in the repository launches a web application built with Streamlit that allows users to input song features and receive a genre classification in real-time.

Running the Web App

Ensure you have Streamlit installed. If not, install it using pip:

pip install streamlit

Launch the app by running the following command in the terminal:

streamlit run src/app.py

Saved Model and Preprocessing Files

The best-performing model, along with the PCA transformation and scaler used for preprocessing, are saved as .pkl files. These include:

models/song_classifier.pkl: The trained classification model.

models/scaler.pkl: The scaler used to normalize features.

models/pca.pkl: The PCA transformation applied to reduce dimensionality.

Results

Performance Consistency: Logistic Regression and SVM demonstrated consistent performance across cross-validation and test sets.

Overfitting in Tree-based Models: Decision Tree and Random Forest showed a significant difference in train and test accuracies, indicating potential overfitting.

Hyperparameter Tuning: Post-tuning, the Decision Tree and Random Forest models exhibited improved test set performance with accuracies around 82% and 85%, respectively.

Insight on Model Tuning: The Random Forest Classifier's test accuracy decreased slightly from 85.49% to 84.84% after tuning, highlighting that hyperparameter tuning does not always enhance performance. This serves as an important lesson in model optimization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Dataset

Methodology

1. Data Collection and Preprocessing

2. Exploratory Data Analysis (EDA)

3. Feature Engineering and Selection

4. Model Development and Evaluation

Installation and Usage

Web Application

Running the Web App

Saved Model and Preprocessing Files

Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
models		models
notebooks		notebooks
src		src
README.md		README.md
requirements.txt		requirements.txt

Shanmukhi1920/Song-Genre-Classification

Folders and files

Latest commit

History

Repository files navigation

Overview

Dataset

Methodology

1. Data Collection and Preprocessing

2. Exploratory Data Analysis (EDA)

3. Feature Engineering and Selection

4. Model Development and Evaluation

Installation and Usage

Web Application

Running the Web App

Saved Model and Preprocessing Files

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages