Skip to content

Repo includes notes, projects, and tutorials for academic and self-learning purposes.

Notifications You must be signed in to change notification settings

beauvilerobed/data-mining-101-with-python

Repository files navigation

Notes on Data Mining

1. Getting Started with Data Mining

  • Introducing data mining
  • A simple affinity analysis example
  • What is affinity analysis?
  • Product recommendations
  • Implementing a simple ranking of rules
  • Support
  • Confidence
  • Ranking to find the best rules
  • A simple classification example
  • What is classification?
  • Loading and preparing the dataset
  • Implementing the OneR algorithm
  • The algorithm
  • Testing the algorithm
  • The rule

2. Classifying with scikit-learn Estimators

  • scikit-learn estimators
  • Nearest neighbors
  • Distance metrics
  • Loading the dataset
  • Moving towards a standard workflow
  • Running the algorithm
  • Setting parameters
  • Preprocessing using pipelines
  • An example
  • Standard preprocessing
  • Putting it all together
  • Pipelines

3. Predicting Sports Winners with Decision Trees

  • Loading the dataset
  • Collecting the data
  • Cleaning up the dataset
  • Extracting new features
  • Decision trees
  • Parameters in decision trees
  • Using decision trees
  • Glossary for expanded standings
  • Extra: Model Training Using GridSearch
  • Random forests
  • How do ensembles work?
  • Parameters in Random forests
  • Applying Random forests
  • Engineering new features (a guide)

4. Recommending Movies Using Affinity Analysis

  • Affinity analysis
  • Algorithms for affinity analysis
  • Choosing parameters
  • The movie recommendation problem
  • Obtaining the dataset
  • Sparse data formats
  • The Apriori implementation
  • The Apriori algorithm
  • Implementation
  • Extracting association rules
  • Evaluation

5. Extacting Features with Transformers

About

Repo includes notes, projects, and tutorials for academic and self-learning purposes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published