Skip to content

Latest commit

 

History

History
35 lines (25 loc) · 3.46 KB

File metadata and controls

35 lines (25 loc) · 3.46 KB

Students Enrolment Status - Machine Learning Project

Introduction

This is a machine learning project that involves multi-class classification to predict the status of university students, whether they are a dropout, enrolled, or graduate. The dataset used in this project was provided by Kaggle, and it was a contest in which I achieved the 22nd rank out of 209 participants, which is in the top 11%.

About Me

My name is Sudeep Sinha, and I completed this machine learning project under the supervision of IIT Madras.After completing the project and achieving the 22nd rank in the contest, I was required to present my work in a viva to the IIT Madras Program of Study and Evaluation (POD). During the viva, I was asked questions about my approach, methodology, and results, and I had to defend my work and demonstrate my understanding of the project. The experience of presenting my work in front of a panel of experts was both challenging and rewarding, and I learned a lot from the feedback and critiques I received. Overall, this project has been a great learning experience for me, and I'm grateful for the opportunity to have worked on it under the guidance of IIT Madras.

Project Details

Project Name

Students Enrolment Status

Problem Statement

The monitoring and support of university students are considered very important at many educational institutions. In this competition, the problem is formulated as a three-category classification task (dropout, enrolled, and graduate) and the classes are coded as 0, 1, and 2 in the dataset.

Project Steps

Step 1: Look at the Big Picture

I defined the problem statement as a supervised learning problem that involves multi-class classification. The objective is to classify students into one of three categories: dropout, enrolled, or graduate, based on their status at the end of the normal duration of the course. The dataset assigns numerical codes 0, 1, and 2 to represent each class.

Step 2: Get the Data

I explored the data by looking at data samples, train data, test data, and data statistics.

Step 3: Data Visualization

I performed data visualization to better understand the dataset. Scatter plots, correlation matrix, and correlation matrix with a heatmap were used to visualize the data.

Step 4: Prepare the Data for Machine Learning Algorithm

In this step, I separated features and labels from the training set and strat test set, performed data cleaning, and data preprocessing using various techniques like minmaxscaler, StandardScaler, MaxAbsScaler, and LabelBinarizer.

Step 5: Selection and Training of Machine Learning Models

I trained and tested several machine learning models like Baseline Model, Logistic Regression Model, Perceptron, DecisionTreeClassifier, KNN, SVM, Random Forest with Hyperparameter tuning, Bagging, Boosting, Hyperparameter tuning, and XGB. The final model selected was Random Forest Classifier.

Step 6: Sample Submission

Finally, a sample submission was created to demonstrate how the model performs in real-life scenarios.

Conclusion

The Students Enrolment Status machine learning project achieved the 22nd rank out of 209 participants in the Kaggle contest. I implemented several machine learning models to predict the status of university students accurately. The project involved data visualization, data cleaning, data preprocessing, and model selection and training, making it a comprehensive machine learning project. The final model selected was Random Forest Classifier.