Feature Engineering with Python
-
Updated
Nov 2, 2024 - Jupyter Notebook
Feature Engineering with Python
This file provides full practice of data preprocessing methods and techniques using different types of libraries.
There are lot of things that need to be done on the given dataset before we feed it to the machine, these things come under data preprocessing. In this repository I have tried to explain those things with some examples.
The Bike Sharing Company wants to understand the independent variables on their past data to analyze and create a machine learning model to understand the demand of the bike and accordingly plan a business strategy.
X Education Organization wants to identify if a customer registered on their website for enquiry is a potential customer or not. Using past data to build a machine learning algorithm
different types of regression
To predict which customer is most likely to convert
This python code shows howw regression is handled in case of categorical variables using duumies. It calculates the multiple regression code and shows the regression table. It also performs the residual analysis.
Introduction to Machine Learning course - Spring 2021 - Supervised and Unsupervised Learning, KNN Classification Models, Naive-Bayes Classifier, Regression Analysis, K-Means and DBSCAN Clustering Analysis, Association Rules and PCA, Confusion Matrix, Normalization, Dummy Variables.
Predictive model that tells important factors(or features) affecting the demand for shared bikes
Sample programs with basic machine learning concepts
Scientific programming through the SKLearn / Scikitlearn library
Working Examples of all algorithms with datasets
Business Goal: To model the demand for shared bikes with the available independent variables. It will be used by the management to understand how exactly the demands vary with different features. They can accordingly manipulate the business strategy to meet the demand levels and meet the customer's expectations.
King County Real Estate Model
Build a model with machine learning to predict housing prices in Ames, Iowa. Top 11% in the Kaggle Housing Prices Competition.
Goal is to predict the miles per gallon of the cars using different attributes
Modeled the credit risk associated with consumer loans. Performed exploratory data analysis (EDA), preprocessing of continuous and discrete variables using various techniques depending on the feature. Checked for missing values and cleaned the data. Built the probability of default model using Logistic Regression. Visualized all the results. Com…
Add a description, image, and links to the dummy-variables topic page so that developers can more easily learn about it.
To associate your repository with the dummy-variables topic, visit your repo's landing page and select "manage topics."