This project is an end-to-end machine learning solution to predict student performance based on various features such as gender, race/ethnicity, parental level of education, lunch, test preparation course etc. It involves the entire machine learning lifecycle, including:
Data Ingestion: Collecting and preprocessing raw student data from multiple sources.
Data Transformation: Cleaning, normalizing, and transforming the data for use in model training, with feature engineering techniques applied.
Model Training: Implementing machine learning algorithms such as linear regression, decision trees, or more advanced models to predict student performance.
Pipeline Deployment: Automating data ingestion, transformation, model training, and evaluation processes for a seamless and scalable workflow.
Evaluation: Assessing the model’s performance using metrics like accuracy, precision, recall, and mean squared error.
This project aims to create a robust, automated pipeline that can predict student outcomes.