Predictive Analytics with Python

These are my notes from working through the book Learning Predictive Analytics with Python by Ashish Kumar and published on Feb 2016.

General

###Chapter 1: Getting Started with Predictive Modelling

Installed Anaconda Package.
Python3.5 has been installed.
Book follows python2, so some codes is modified along the way for python3.

###Chapter 2: Data Cleaning

Reading the data: variations and examples
Data frames and delimiters.

####Case 1: Reading a dataset using the read_csv method

####Case 2: Reading a dataset using the open method of Python

File: readDatasetByOpenMethod.py

####Case 3: Reading data from a URL

Modified the code that it works and prints out line by line dictionary of the dataset.
File: readURLLib2Iris.py
File: readURLMedals.py

####Case 4: Miscellaneous cases

File: readXLS.py
Created the file above to read from both .xls an .xlsx

####Basics: Summary, dimensions, and structure

File: basicDataCheck.py
Created the file above to read from both .xls an .xlsx

####Handling missing values

File: basicDataCheck.py
RE: Treating missing data like NaN or None
Deletion orr imputaion

####Creating dummy variables

File: basicDataCheck.py
Split into new variable 'sex_female' and 'sex_male'
Remove column 'sex'
Add both dummy column created above.

####Visualizing a dataset by basic plotting

File: plotData.py
Figure file: ScatterPlots.jpeg
Plot Types: Scatterplot, Histograms and boxplots

###Chapter 3: Data Wrangling ####Subsetting a dataset

####Generating random numbers and their usage

####Grouping the data – aggregation, filtering, and transformation

####Random sampling – splitting a dataset in training and testing datasets

File: splitDataTrainTest.py
Method 1: using the Customer Churn Model
Method 2: using sklearn
Method 3: using the shuffle function

####Concatenating and appending data

File: concatenateAndAppend.py
File: appendManyFiles.py

####Merging/joining datasets

###Chapter 4: Statistical Concepts for Predictive Modelling ####Random sampling and central limit theorem ####Hypothesis testing

Null versus alternate hypothesis
Z-statistic and t-statistic
Confidence intervals, significance levels, and p-values
Different kinds of hypothesis test
A step-by-step guide to do a hypothesis test
An example of a hypothesis test

####Chi-square testing ####Correlation

File: linearRegression.py
File: linearRegressionFunction.py
Picture: TVSalesCorrelationPlot.png
Picture: RadioSalesCorrelationPlot.png
Picture: NewspaperSalesCorrelationPlot.png

###Chapter 5: Linear Regression with Python ####Understanding the maths behind linear regression

Linear regression using simulated data
File: linearRegression.py
Picture: CurrentVsPredicted1.png
Picture: CurrentVsPredictedVsMean1.png
Picture: CurrentVsPredictedVsModel1.png

####Making sense of result parameters

File: linearRegression.py
p-values
F-statistics
Residual Standard Error (RSE)

####Implementing linear regression with Python

File: linearRegressionSMF.py
Linear regression using the statsmodel library
Multiple linear regression
Multi-collinearity: sub-optimal performance of the model
Variance Inflation Factor
It is a method to quantify the rise in the variability of the coefficient estimate of a particular variable because of high correlation between two or more than two predictor variables.

####Model validation

Training and testing data split
File: linearRegressionSMF.py
Linear regression with scikit-learn
File: linearRegressionSKL.py
Feature selection with scikit-learn
Recursive Feature Elimination (RFE)
File: linearRegressionRFE.py

####Handling other issues in linear regression

Handling categorical variables
File: linearRegressionECom.py
Transforming a variable to fit non-linear relations
File: nonlinearRegression.py
Picture: MPGVSHorsepower.png
Picture: MPGVSHorsepowerVsLine.png
Picture: MPGVSHorsepowerModels.png
Handling outliers
Other considerations and assumptions for linear regression

###Chapter 6: Logistic Regression with Python ####Linear regression versus logistic regression ####Understanding the math behind logistic regression

####Implementing logistic regression with Python

####Model validation and evaluation

File: logisticRegressionImplementation.py
Cross validation

####Model validation

File: logisticRegressionImplementation.py
The ROC curve {see terms}

###Chapter 7: Clustering with Python ####Introduction to clustering – what, why, and how?

What is clustering?
How is clustering used?
Why do we do clustering?

####Mathematics behind clustering

####Implementing clustering using Python

File: clusterWine.py
Importing and exploring the dataset
Normalizing the values in the dataset
Hierarchical clustering using scikit-learn
K-Means clustering using scikit-learn
Interpreting the cluster

####Fine-tuning the clustering

The elbow method
Silhouette Coefficient

###Chapter 8: Trees and Random Forests with Python ####Introducing decision trees

A decision tree

####Understanding the mathematics behind decision trees

####Implementing a decision tree with scikit-learn

####Understanding and implementing regression trees

File: regressionTree.py
Regression tree algorithm
Implementing a regression tree using Python

####Understanding and implementing random forests

File: randomForest.py
The random forest algorithm
Implementing a random forest using Python
Why do random forests work?
Important parameters for random forests

###Chapter 9: Best Practices for Predictive Modelling ####Best practices for coding

####Best practices for data handling

####Best practices for algorithms

####Best practices for statistics

####Best practices for business contexts

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Ch02		Ch02
Ch03		Ch03
Ch04		Ch04
Ch05		Ch05
Ch06		Ch06
Ch07		Ch07
Ch08		Ch08
datasets		datasets
.gitignore		.gitignore
ISSUELOG.md		ISSUELOG.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Analytics with Python

General

About

Releases

Packages

Languages

JasonMDev/learning-python-predictive-analytics

Folders and files

Latest commit

History

Repository files navigation

Predictive Analytics with Python

General

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages