Skip to content

aya9aladdin/used-cars-price-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Used Cars Price Prediction

Motivation

Car prices in Egypt have been in a chaotic situation in the past couple of years, especially in the used car market. In this project, I developed a pipeline to scrap car data (nearly 3OK cars) listed on Hatla2ee.com (one of the biggest Egyptian used cars marketplaces) for sale then stored it Amazone S3 storage as a data lake and loaded it in Amazon Redshift data warehouse using airflow to run the pipeline daily. Next, I applied some analytics to the data to prepare it for feeding to a neural network model to predict car prices based on its main features (brand, model, class, km, transmission, and fuel type), and then I created a RESTful API to deploy the model using Apache Flask. Lastly, I developed a web application using Plotly Dash provides an interactive user interface for predicting car prices

Data Source:

I used the beatifulSoup library to scrap the whole used car data on hatla2ee.com. Firstly, I scraped the car's data based on the fuel type as a search filter to avoid the need to load every single car page to get the fuel data (this was the only information that was not available on the car's list page and needed to load each car page to get). Then, I scraped data based on the car body search filter, because when I used both fule and body data as filters at the same time, I found that only 10% of the data were being scraped, which turned out to be caused by the fact that not all cars have the body type information available so they don't appear in the search filter. I scraped the available cars with body type information and used it to develop a data set with the available body data for each model, which can be used to get the body type of cars of the same model with no body data.

Technologies

The technologies I used in developing the pipeline are:

Scrapin: BeatifulSoup
Cloud: AWS
Data Lake: S3
Data Warehouse: Amazon Redshift
Workflow Orchestration: Airflow
API development: Apache Flask
Web app development: Dash Plotly

Data Flow

Airflow manages two main pipelines full_loadand ìncremental_load pipelines. The full_oad one runs once to initiate the database by scraping the bulk corpus from the car website. On the other hand, the ìncremental_load pipeline runs daily to scrap new cars data and update existing cars if the price has been changed by the user using thefingerprint column to track any changes in prices.

image

Database Model

image

  • The cars_data table contains the main car scraped information, the fingerprint column is a combination of the car_id and price columns to be used as a signature to track if the car price has been changed by the seller and needs to be updated.
  • car_body_data contains information about the body of the car for each car brand/model.
  • car_class_data contains the classes available for each model.

Data Analytics

All the analytics are found in detail in the notebook directory. I'll show the main charts here

Brand popularity in the market

image

Model popularity in the market

image

Most popular colors

colors

Average age of the car per brand

image

Average Km per brand

image

Average Km of the car per Model (top 50)

model km

Fuel, transmission, and body distributions

colle

Cars distribution per governorate

gov

brands distribution per governorate

govs

Prediction model

I have built a neural network of three dense layers and trained the data on it. The accuracy of the model was around 92%. More details about building the model and data preparation are in the notebook directory. After training the model, I saved it as a pickle file along with car data frame and trained scaler, to be used later by the prediction API and the web application

Web application

This is the interface of the web application. The user selects the car properties and then clicks Predict to see the results, the application also shows statistics of car prices per popular brand and model. here is a video demo showing the web app and how to use it: https://youtu.be/xCKlSArHJvQ

image image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published