The aim of this project is to examine the data of the Iowa Housing dataset and to build a Machine Learning Model to predict house pricing.
Information of this type can be extraordinarily useful to both home-seekers and realtors alike, perhaps most of all to the latter. With this information successfully processed, a realtor can improve or focus on improving particularly useful elements of a house to best leverage their ROI. In time, data like these can make a particular realtor's offerings consequently seem more consistently appealing to a wider market, netting them higher, longer term profits as well.
For these reasons and more, we shall dig deeply into these data and see what information we can glean using classical Machine Learning Regression techniques and data processing approaches.
Involved in the processing of these data, I employed a variety of tools. The most notable of these were the Pandas, Numpy, the Statsmodels api, Scipy, and Seaborne, as well as the PCA, ensembling, model_selection, & pipeline modules of SKLearn.
Of special note is the Missingno package, which proved invaluable in the effective visualization of the missing data in the dataset.
You will see before you are two notebooks divided evenly between the data processing in part 1 and the machine learning regressions in part 2.
In the first part, you will see detailed before you extensive EDA as well as the full explication of our strategy: to lightly engineer a dataset, prepare it for further processing via Principal Component Analysis (PCA), and then to compare the results between different points of data preparation through the separate lenses of multiple regression techinques, spanning linear, support vector, and tree-based regressions.
For further discussion of the project, its process, and the full analysis, please consult the blog which - at the time of this writing - is yet forthcoming. Should you have any other questions, please feel free to reach out to me at theodore.m.cheek@pm.me.