beauvilerobed / data-mining-101-with-python Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Repo includes notes, projects, and tutorials for academic and self-learning purposes.

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data		data
1_GettingStarted.ipynb		1_GettingStarted.ipynb
2_ClassifyingScikitEstimators.ipynb		2_ClassifyingScikitEstimators.ipynb
3_PredictingSportsWinnersTrees.ipynb		3_PredictingSportsWinnersTrees.ipynb
4_RecommendingMovies.ipynb		4_RecommendingMovies.ipynb
5_ExtractFeatsTransformer.ipynb		5_ExtractFeatsTransformer.ipynb
readme.md		readme.md
requirements.txt		requirements.txt

Repository files navigation

Notes on Data Mining

1. Getting Started with Data Mining

Introducing data mining
A simple affinity analysis example
What is affinity analysis?
Product recommendations
Implementing a simple ranking of rules
Support
Confidence
Ranking to find the best rules
A simple classification example
What is classification?
Loading and preparing the dataset
Implementing the OneR algorithm
The algorithm
Testing the algorithm
The rule

2. Classifying with scikit-learn Estimators

scikit-learn estimators
Nearest neighbors
Distance metrics
Loading the dataset
Moving towards a standard workflow
Running the algorithm
Setting parameters
Preprocessing using pipelines
An example
Standard preprocessing
Putting it all together
Pipelines

3. Predicting Sports Winners with Decision Trees

Loading the dataset
Collecting the data
Cleaning up the dataset
Extracting new features
Decision trees
Parameters in decision trees
Using decision trees
Glossary for expanded standings
Extra: Model Training Using GridSearch
Random forests
How do ensembles work?
Parameters in Random forests
Applying Random forests
Engineering new features (a guide)

4. Recommending Movies Using Affinity Analysis

Affinity analysis
Algorithms for affinity analysis
Choosing parameters
The movie recommendation problem
Obtaining the dataset
Sparse data formats
The Apriori implementation
The Apriori algorithm
Implementation
Extracting association rules
Evaluation

5. Extacting Features with Transformers

About

Repo includes notes, projects, and tutorials for academic and self-learning purposes.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%