Skip to content

Latest commit

 

History

History
162 lines (93 loc) · 5.93 KB

README.md

File metadata and controls

162 lines (93 loc) · 5.93 KB

Machine Learning Projects

1. Car Price Prediction:

CAR Price prediction

Problem Statement:

A Chinese automobile company Geely Auto aspires to enter the US market by setting up its manufacturing unit there and producing cars locally to give competition to their US and European counterparts.

They have contracted an automobile consulting company to understand the factors on which the pricing of cars depends. Specifically, they want to understand the factors affecting the pricing of cars in the American market, since those may be very different from the Chinese market. The company wants to know:

Which variables are significant in predicting the price of a car? How well do those variables describe the price of a car? Based on various market surveys, the consulting firm has gathered a large dataset of different types of cars across the American market.

Business Goal:

we are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with respect to the independent variables. They can accordingly manipulate the design of the cars, the business strategy, etc. to meet certain price levels. The model will act as a better medium for management to understand the pricing dynamics of a new market.

The following steps will be followed to reach our goal:

  1. Importing libraries.

  2. Reading the concerned dataset

  3. Data Understanding

  4. Data handling

  5. Data visualization

  6. Data Preparation

  7. Splitting the Data and feature scaling

  8. Building a linear regression model

  9. Residual analysis of the train data

  10. Making Predictions Using the Final Model

  11. Model Evaluation

  12. Conclusion

prdo

2. Wine Machine Learning:

wine

For this project, I used Kaggle’s Red Wine Quality dataset to build various classification models to predict whether a particular red wine is “good quality” or not. Each wine in this dataset is given a “quality” score between 0 and 10. For the purpose of this project, I converted the output to a binary output where each wine is either “good quality” (a score of 7 or higher) or not (a score below 7). The quality of a wine is determined by 11 input variables:

Fixed acidity Volatile acidity Citric acid Residual sugar Chlorides Free sulfur dioxide Total sulfur dioxide Density pH Sulfates Alcohol

Objectives

The objectives of this project are as follows:

To experiment with different classification methods to see which yields the highest accuracy To determine which features are the most indicative of a good quality wine

Steps included in this project:

Importing Lib Loading Data Understanding Data Missing Values Exploring Variables(Data Analysis) Feature Selection The proportion of Good vs Bad Wines Preparing Data for Modelling Applying different models Choosing right model Hurray you just completed the task!

3.Big Mart Sales Prediction

bIG maART SALES PREDICITOM

The aim is to build a predictive model and find out the sales of each product at a particular store. Create a model by which Big Mart can analyze and predict outlet production sales.

A perfect project to learn Data Analytics and apply Machine Learning algorithms (Linear Regression, Random Forest Regressor, XG Boost) to predict outlet production sales.

Dataset Description

BigMart has collected sales data from the year 2013, for 1559 products across 10 stores in different cities. Where the dataset consists of 12 attributes like Item Fat, Item Type, Item MRP, Outlet Type, Item Visibility, Item Weight, Outlet Identifier, Outlet Size, Outlet Establishment Year, Outlet Location Type, Item Identifier, and Item Outlet Sales. Out of these attributes response variable is the Item Outlet Sales attribute and the remaining attributes are used as the predictor variables.

The data set is also based on hypotheses of store level and product level. Where store level involves attributes like:- city, population density, store capacity, location, etc, and the product level hypotheses involve attributes like:- brand, advertisement, promotional offer, etc.

Dataset Details

The data has 8523 rows of 12 variables.

Variable - Details

Item_Identifier- Unique Product ID Item_Weight- Weight of the product Item_Fat_Content - Whether the product is low fat or not Item_Visibility - The % of the total display area of all products in a store allocated to the particular product Item_Type - The category to which the product belongs Item_MRP - Maximum Retail Price (list price) of the product Outlet_Identifier - Unique store ID Outlet_Establishment_Year- The year in which the store was established Outlet_Size - The size of the store in terms of ground area covered Outlet_Location_Type- The type of city in which the store is located Outlet_Type- Whether the outlet is just a grocery store or some sort of supermarket Item_Outlet_Sales - Sales of the product in the particular store. This is the outcome variable to be predicted.

Install Python libraries -

Pip install pandas, pip install numpy, pip install matplotlib, pip install lib, pip install seaborn, pip install Sklearn, pip install joblib, pip install flask

Project Flow

We will handle this problem in a structured way. Loading Packages and Data Data Structure and Content Exploratory Data Analysis Missing Value Treatment Feature Engineering Encoding Categorical Variables Label Encoding Preprocessing Data Modeling Deployment

Video : https://www.youtube.com/watch?v=i57QmGR4M1Q

Feel free to drop a star if you like it.