Starting with importing the necessary modules, I began exploring the data using histograms and heatmaps to gain initial insights. Through these visualizations, I identified the most dominant features within the dataset. Subsequently, I preprocessed the data by applying techniques such as one-hot encoding to handle categorical features like "ocean_proximity." Additionally, I addressed skewness in four features by normalizing them using logarithmic transformations.
During feature engineering, I created two new impactful features: "no of bedrooms per rooms" and "no of rooms per households," which proved to be influential in the modeling process. These engineered features significantly enhanced the predictive capability of the models.
In the modeling phase, I applied both linear regression and random forest regression techniques. The linear regressor achieved an accuracy of 66%, while the random forest regressor significantly improved performance to 82%. This also underscored the importance of selecting the right model for optimal predictive performance.