Store Sales Forecasting with Time Series Models and CI/CD Pipeline

Project Overview

This project demonstrates the use of advanced time series forecasting techniques to predict store sales for Favorita stores. It showcases the implementation of both Amazon SageMaker's DeepAR algorithm and a custom deep learning time series model, along with the setup of a CI/CD pipeline in AWS for model deployment and monitoring.

Data Source

The data used in this project is the Store Sales Time Series Forecasting dataset from Kaggle. The data was stored in S3 and contained the following files:

train.csv

The training data, comprising time series of features store_nbr and onpromotion as well as the target sales.

store_nbr identifies the store at which the products are sold.

sales gives the total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips).

onpromotion gives the total number of items in a product family that were being promoted at a store at a given date.

stores.csv

Store metadata, including city, state, type, and cluster (cluster is a grouping of similar stores).

oil.csv

Daily oil price. Includes values during both the train and test data timeframes. (Ecuador is an oil-dependent country and it's economical health is highly vulnerable to shocks in oil prices.)

holidays_events.csv

Holidays and Events, with metadata.

Environment Setup

To set up the project environment:

Open the 0-Environment_Setup.ipynb notebook.
Run all cells to install necessary libraries and set up AWS credentials.
This notebook will also define global variables and functions used throughout the project.

Data Loading

To load the project data:

Open the 1 - Load Data.ipynb notebook.
Run all cells to load data from the local environment into the AWS S3 datalake.
The notebook uses the AWS CLI to copy CSV files to the specified S3 bucket.

Exploratory Data Analysis

The 2 - Exploratory_Data_Analysis_w_expanded_EDA.ipynb notebook contains a comprehensive analysis of the dataset, including:

Sales trends by store characteristics
Impact of holidays on sales
Effect of promotions
Relationship between transactions and sales
Influence of oil prices on sales

Data Preprocessing

The 3 - Data Preprocessing.ipynb notebook covers:

Data cleaning
Handling missing values
Date/time feature extraction

Feature Engineering

The 5 - Feature Engineering.ipynb notebook details the creation of:

Time-based features
Store characteristic features
Economic indicators
Holiday and promotion features

Model Development

Baseline Linear Regression

The 7.0 - Baseline Model Linear Regression Development.ipynb notebook establishes a baseline model for comparison.

Custom Time Series Model

The 7.1 - Custom Time Series Model Development.ipynb notebook covers the development of a custom deep learning model for time series forecasting.

DeepAR Model

The 7.2 - Model Training DeepAR.ipynb notebook demonstrates the use of Amazon SageMaker's DeepAR algorithm for forecasting.

Model Deployment

Custom Model: 8.1 - Deploy Custom Time Series Model.ipynb
DeepAR Model: 8.2 - Deploy model DeepAR.ipynb

These notebooks cover the process of deploying the trained models to Amazon SageMaker endpoints.

CI/CD Pipeline Setup

Custom Model: 10.1 - CICD Pipeline Custom Time Series Model.ipynb
DeepAR Model: 10.2 - CICD Pipeline DeepAR.ipynb

These notebooks detail the setup of CI/CD pipelines for automated model training, evaluation, and deployment using AWS services.

Tools and Technologies

This project utilizes a range of AWS services and machine learning tools:

Amazon SageMaker: For model training, deployment, and pipeline orchestration
Amazon S3: Data storage and model artifacts
Amazon Athena: For querying data in S3
AWS Glue: Data catalog and ETL jobs
Amazon CloudWatch: For monitoring and logging
Pandas, NumPy, Scikit-learn: Data manipulation and preprocessing
TensorFlow: Custom model development
Matplotlib, Seaborn: Data visualization

Getting Started

Clone this repository.
Ensure you have the necessary AWS permissions and credentials set up.
Follow the notebooks in order, starting with 0-Environment_Setup.ipynb.
Each notebook contains detailed instructions and explanations for each step of the process.

Contributors

@t4ai / Tyler Foreman

@julietlawton / Juliet Lawton (commits from "root" are also Juliet)

@Yoha02 / Eyoha Gir

License

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Store Sales Forecasting with Time Series Models and CI/CD Pipeline

Project Overview

Data Source

train.csv

stores.csv

oil.csv

holidays_events.csv

Table of Contents

Environment Setup

Data Loading

Exploratory Data Analysis

Data Preprocessing

Feature Engineering

Model Development

Baseline Linear Regression

Custom Time Series Model

DeepAR Model

Model Deployment

CI/CD Pipeline Setup

Tools and Technologies

Getting Started

Contributors

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Store Sales Forecasting with Time Series Models and CI/CD Pipeline

Project Overview

Data Source

train.csv

stores.csv

oil.csv

holidays_events.csv

Table of Contents

Environment Setup

Data Loading

Exploratory Data Analysis

Data Preprocessing

Feature Engineering

Model Development

Baseline Linear Regression

Custom Time Series Model

DeepAR Model

Model Deployment

CI/CD Pipeline Setup

Tools and Technologies

Getting Started

Contributors

License