This project implements a machine learning solution to predict student academic performance based on various socio-demographic and academic factors. Using features like parental education, test preparation, and previous scores, the model predicts student performance in mathematics, helping educators identify students who might need additional support.
This project utilizes several machine learning algorithms, including:
- Random Forest
- XGBoost
- CatBoost
- Linear Regression
- Support Vector Regression
- Decision Trees
The models are optimized using hyperparameter tuning and evaluated based on R² score and Mean Squared Error.
To set up the project on your local machine:
-
Clone the repository:
git clone https://github.com/yourusername/student-performance-prediction.git cd student-performance-prediction
-
Install the dependencies:
pip install -r requirements.txt
-
Download the student performance dataset and place it in the root directory.
-
Run the data ingestion script:
python src/components/data_ingestion.py
-
Transform the data:
python src/components/data_transformation.py
-
Train the model:
python src/components/model_trainer.py
-
Deploy using Docker:
docker build -t student-performance-prediction . docker run -p 5000:5000 student-performance-prediction
The dataset includes various features about students:
- Gender
- Race/Ethnicity
- Parental Level of Education
- Lunch Type
- Test Preparation Course
- Reading and Writing Scores
- Math Score (Target Variable)
The model pipeline includes:
- Data preprocessing with encoding of categorical features
- Model training with cross-validation
- Hyperparameter optimization using GridSearchCV
- Performance evaluation using R² score metrics
- Model deployment using Flask API and Docker