Obtain an optimal model based on a statistical data to estimate the best price for a customer's house.
In this project, we applied basic machine learning techniques on data accumulated for housing prices in the city o Boston, Massachusetts area. We mainly make a prediction about the selling price of a new home. First, we disconered the data to obtain substantial features and descriptive statistics about the dataset. Secondly, we appropriately seperate the data into testing and training subsets, and identified a convenient performance metric for the main problem. We then investigated efficiency graphs for a learning algorithm with alternating parameters and training dataset sizes. This allowed us to choose the optimal model that best generalizes for the hidden data. Finally, we tested the optimal model that we found on a new sample and compare the predicted price to our values.
The project has 4 files:
project_description.md: Explain the project in detail
boston_housing.ipynb: This is the main file where we contribute our work for the project.
housing.csv: The dataset.
visuals.py: This Python script includes helper functions to create the required visualizations.
You can see the results when you click on boston_housing.ipynb from your browser.
To execute the code you need Python 3.#, and the following Python libraries:
NumPy,
pandas,
scikit-learn,
matplotlib,
You will also need to have software installed to run and execute a Jupyter Notebook.
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 3.x installer and not the Python 2.7 installer.