Skip to content

zieglerk/data-science-tutorials

Repository files navigation

data-science-tutorials

TODO cluster by tasks (instead of algorithms)

This repository contains jupyter-notebooks to accompany the tutorials for our data science lectures. The following topics are covered (each within a separate folder).

  1. Dataset Visualization (Boston Housing minus the linear regression; also other datasets like Flower, MNIST-digits, 20newsgroups) working/visualizing one dataset (incl. Matplotlib; .describe attribute; box-plot, min-max-normilization; boston housing; linear reg c/o dsP)
  2. Clustering
  3. Association Rule Learning (dataset yet to be determined; preferably from scikit learn)
  4. Regression (linear regression from Boston Housing and Car Prices)
  5. Bayes Learning (for spam filtering/text classification)
  6. Classification with Decision Trees (start with small 5-line dataset)
  7. Neural Networks (use keras.io to build a neural network for MNIST-digit classification; here's a tutorial); OPT use gensim (for word2vec; pick dataset from tensorflow); then auto-encoder for representatino learning
  8. OPTIONAL MapReduce

Packages

The main packages are pandas & scikit-learn

See our python-tutorials on instructions how to set this up on your machine.

required

  • Python (>= 2.7 or >= 3.3)
  • NumPy (>= 1.6.1)
  • SciPy (>= 0.9)
  • scikit-learn (>=0.18.1); documentation, also as pdf with Quick Start and Tutorials
  • Matplotlib >= 2.1.1
  • Pandas; [documentation] also as pdf

Table of contents

  • 0-Intro

    • Scikit-learn-overview.ipynb
    • Web Mining Project .ipynb
  • 1-Datasets_Visualization_and_preprocessing

    • 1-IRIS.ipynb
    • 2-Boston_house_dataset.ipynb
    • 3-MNIST.ipynb
    • 4-UCI_CAR.ipynb
    • 5-20newsgroups.ipynb
    • 6-KDD_cup_2000_data_set.ipynb
    • Crawling_twitter_with_python.ipynb
    • MDS_projection.ipynb (IRIS)
    • PCA_projection.ipynb (IRIS)
    • scikit-learn-overview-and-preprocessing.ipynb (IRIS)
    • VA-InformationVisualisation-with-JavaScript-and-3DJs.ipynb
    • TODO try visualization with Orange (available through the conda-forge channel)
  • 2-Clustering

    • Clustering_overview.ipynb (IRIS) (MNIST)
    • Tutorial_clustering_for_outlier_detection_3D.ipynb (Kddcup 1999)
    • Tutorial_clustering_for_outlier_detection.ipynb (Kddcup 1999)
  • 3-Association-Rules

    • Apriori_asaini.ipynb (MBE_dataset)
    • Apriori.ipynb (Boston house)
    • Apriori_server.ipynb (Mango_dataset)
    • Assignment_Association_rule_learning.ipynb
    • Tutorial_association_rule_learning_shopping_basket.ipynb (KDDcup 2000)
  • 4-Linear_regression_and_logistic_regression

    • Assignment_Linear_Regression.ipynb
    • Assignment_Logistic_regression.ipynb (UCI_car)
    • Boston_house_Linear_Regression.ipynb (Boston house)
    • Linear_regression_diabetes_dataset.ipynb
    • Linear-Regression.ipynb (Boston house)
    • Logistic_regression.ipynb (IRIS)
    • Small_scale_linear_regression.ipynb (KDDcup)
    • Supervised_Learning_with_Linear_Models.ipynb (Boston house)
  • 5-KNN_classification

    • KNN_classification.ipynb (IRIS)
    • Metrics.ipynb (IRIS)
  • 6-Bayes-Learning

  • 7-Decision-Trees.ipynb (UCI_car)

  • 8-Neural-Networks

    • keras-mnist.ipynb (MNIST)
    • Simple-NN.ipynb (make_moons)
    • Stacked-Denoising-Autoencoders.ipynb
    • INFO Software Comparison
      • keras.io (high-level, running on top of TensorFlow (default) or Theano) c/o Francois Chollet (written in Python)
      • Theano c/o Universite de Montreal (written in Python; tightly integrated with NumPy)
      • TensorFlow c/o Google Brain (written in Python/C++)
  • 9-SVM

    • Assignment_SVM_for_OCR.ipynb (MNIST)
    • Support_Vector_Machines.ipynb (IRIS)
  • A-Advanced_modules

    • NLP-with-NLTK-Short-Intro.ipynb
  • B-Scripts

Links

Cheat Sheets

Other Collections

Module Specific

(should be listed at the module)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published