This project aims to develop a Machine Learning model to predict California housing prices. The model predicts the median housing price of a district, helping determine whether investing in that area is worthwhile.
├── data
│ └── housing.csv <- Data from kaggle.
├── images <- images for visualization
├── models <- Trained Models
├── notebooks
│ ├── preparation_notebooks <- Only necessary Notebooks for model production; data preparation, pipeline creation, parameter tuning etc.
│ ├── testing_notebooks <- Every notebook for quick test; dump and test notebooks
│ └── main.ipynb <- Main Notebook
├── requirements.txt <- The requirements file, generated with `pip freeze > requirements.txt`
└── Readme.md <- Project Explanation, notes etc.
- Analyze and preprocess the dataset.
- Train different regression models and compare their performances.
- Select the best-performing 3 to 5 models and perform hyperparameter tuning.
- Select the best-performing 2 or 3 model, ensemble these and compare their performances.
- Get best model out of these.
- Supervised Learning: The model is trained with labeled examples.
- Regression Task: The model is used to predict a value
median-house-price
. - Data Preprocessing: The dataset is prepared by handling missing data, processing outliers, and feature engineering, transformation, extraction steps.
- Model Selection: 14 different regression models are trained and their performances are compared.
- Hyperparameter Tuning: The hyperparameters of the 5 best-performing model are tuned.
- Selection of Model: After ensemble GradienBoostingRegressor and LGBMRegressor, decided to use
LGBMRegressor
Model
- RMSE values for the performance of the
LGBMRegressor
Model on the training, testing, and validation sets are reported onmain.ipynb
.- There is overfitting issue going on but not much. Test scores and validation scores is acceptable.
- Data augmentation and further hyperparameter tuning are recommended for model improvement.
- Navigate to the project directory.
- Install the necessary dependencies by running
pip install -r requirements.txt
- Open
notebooks
directory thenmain.ipynb
notebook and run it. This will run whole projects, and can take some time.