Skip to content

mhjcodes/sagemaker-flight-prices-prediction

Repository files navigation

End-to-End Machine Learning using AWS SageMaker

Overview

  • Step 1: Introduction to AWS SageMaker

    • Overview of SageMaker, S3, EC2 & IAM features and capabilities
    • Setting up AWS environment and SageMaker instance
  • Step 2: GitHub Setup & Data Cleaning

    • Setting up local & remote repository using GitHub
    • Data Cleaning using Numpy and Pandas best practices
  • Step 3: Exploratory Data Analysis

    • Understanding the workflow of systematically analyzing datasets
    • Understanding the various plots, statistical measures and hypothesis tests to analyze datasets
    • Exploring a custom EDA module for convenience and significantly reduce complexity of analyzing datasets
    • Performing in-depth analysis of various kinds of numeric, categorical and date-time variables
    • Leveraging statistical measures, hypothesis tests, and univariate, bivariate and multivariate plots
  • Step 4: Feature Engineering and Data Preprocessing

    • Understanding feature engineering teachniques for different types of variables
    • Creating scikit-learn compatible custom classes and functions
    • Using advanced scikit-learn features for feature engineering and data preprocessing such as:
      • Pipeline
      • Feature Union
      • Function Transformer
      • Column Transformer
  • Step 5: Model Training and Deployment

    • Training and Tuning a machine learning model on SageMaker
    • Using S3 buckets for storage and EC2 for computing purposes
    • Creating a web application from scratch and deploying over cloud using Streamlit