Rossmann Sales Prediction

💾 Problem Statement and Project Description

Rossmann operates over 3,000 drug stores in 7 European countries. Currently, Rossmann store managers are tasked with predicting their daily sales for up to six weeks in advance. Store sales are influenced by many factors, including promotions, competition, school and state holidays, seasonality, and locality. With thousands of individual managers predicting sales based on their unique circumstances, the accuracy of results can be quite varied. You are provided with historical sales data for 1,115 Rossmann stores. The task is to forecast the "Sales" column for the test set. Note that some stores in the dataset were temporarily closed for refurbishment.

💾 Table of Content

Problem Statement and Project Description
Project Files Description
Goal
Dataset Information
Exploratory Data Analysis
Random Forest Model
Technologies Used

💾 Project Files Description

This project contains two executable file as follows:

Executable Files:

Rossmann Sales Prediction - Capstone Project.ipynb - Google Collab notebook containing data summary, exploration, visualisations and modeling, model hyperparameter tuning, model performance, evaluation and conclusion.

Source Directory:

Data & Resources link : https://drive.google.com/drive/folders/1qnxqMxy8_gI-siwVhUaQOBW1QbM_07g0

📖 Goal:

The interest in a product continues to change occasionally. No business can work on its monetary growth without assessing client interest and future demand of items precisely. Sales forecasting refers to the process of estimating demand for or sales of a particular product over a specific period of time. This project involves solving a real-world business problem of sales forecasting and building up a machine learning model for the same.

Our goal here is to forecast the sales for six weeks for each store and find out the factors influencing it and recommend ways in order to improve the numbers.

📖 Dataset information:

Features in the dataset: Most of the fields are self-explanatory. The following are descriptions for those that aren't.

Id - an Id that represents a (Store, Date) duple within the test set
Store - a unique Id for each store
Sales - the turnover for any given day (this is what you are predicting)
Customers - the number of customers on a given day
Open - an indicator for whether the store was open: 0 = closed, 1 = open
StateHoliday - indicates a state holiday. Normally all stores, with few exceptions, are closed on state holidays. Note that all schools are closed on public holidays and weekends. a = public holiday, b = Easter holiday, c = Christmas, 0 = None
SchoolHoliday - indicates if the (Store, Date) was affected by the closure of public schools
StoreType - differentiates between 4 different store models: a, b, c, d
Assortment - describes an assortment level: a = basic, b = extra, c = extended
CompetitionDistance - distance in meters to the nearest competitor store
CompetitionOpenSince[Month/Year] - gives the approximate year and month of the time the nearest competitor was opened
Promo - indicates whether a store is running a promo on that day
Promo2 - Promo2 is a continuing and consecutive promotion for some stores: 0 = store is not participating, 1 = store is participating
Promo2Since[Year/Week] - describes the year and calendar week when the store started participating in Promo2
PromoInterval - describes the consecutive intervals Promo2 is started, naming the months the promotion is started anew. E.g. "Feb,May,Aug,Nov" means each round starts in February, May, August, November of any given year for that store

📈 Exploratory Data Analysis

There were more sales on Monday, probably because shops generally remain closed on Sundays which had the lowest sales in a week. Store type B though being few in number had the highest sales average. The reasons include all three kinds of assortments specially assortment level b which is only available at type b stores and being open on sundays as well. The outliers in the dataset showed justifiable behaviour. The outliers were either of store type b or had promotion going on which increased sales.

Store type B was open on all seven days of the week and had more sales than any other store type and promotion had a positive effect across all store types.

📖 Random Forest

Random forest is a supervised learning algorithm. It creates a "forest" out of an ensemble of decision trees, which are commonly trained using the "bagging" method. The bagging method's basic premise is that combining different learning models improves the overall output. Simply said, random forest combines many decision trees to produce a more accurate and stable prediction.

Furthermore, the random forest classifier is efficient, can handle a large number of input variables, and provides correct predictions in most cases. It's a very strong tool that doesn't require any coding to implement.

📖 XGB Regressor

The XGB Regressor model is an implementation of the XGBoost algorithm, which is an optimized version of gradient boosting. It is particularly useful for large datasets and high dimensional data, and is often used in Kaggle competitions and other machine learning challenges. The XGB Regressor model uses decision tree ensembles as its base learners and is trained by minimizing the gradient of the loss function. It is a powerful model that can be used for both regression and classification tasks.

📈 Results

In this case, the Random Forest model has a Test_R2 score of 0.9527, which is 3.49% higher than the Decision Tree model's score of 0.920600. This suggests that the Random Forest model is able to make better predictions than the Decision Tree model.

On the other hand, the XGB Regressor Tuned model has a Test_R2 score of 0.955427, which is 0.29% higher than the Random Forest model's score of 0.9527. This suggests that the XGB Regressor Tuned model is able to make slightly better predictions than the Random Forest model. However, the difference is small and may not be significant for all use cases. Therefore, it would be necessary to analyze other performance metrics and evaluate the trade-offs between the different models to determine which one is best suited for a particular task.

📖 Technologies Used::

📚 References

Andrew Udell, 'Predicting E-Commerce Sales with Random Forest'. [Online].

Available: https://towardsdatascience.com/predicting-e-commerce-sales-with-a-random-forest-regression-3f3c8783e49b
ChatGPT. [Online].

Available: (https://chat.openai.com/chat)
Builtin.com, 'Random Forest'. [Online].

Available: https://builtin.com/data-science/random-forest-algorithm
Machine Learning Mastery, 'Random Forest for Time Series Prediction'. [Online].

Available: https://machinelearningmastery.com/random-forest-for-time-series-forecasting/

📜 Credits

Mohd Zahid Ansari | Avid Learner | Data Scientist | Machine Learning Engineer | Deep Learning enthusiast

Contact me for Data Science Project Collaborations

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
Rossmann_Sales_Prediction_Capstone_Project.ipynb		Rossmann_Sales_Prediction_Capstone_Project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rossmann Sales Prediction

💾 Problem Statement and Project Description

💾 Table of Content

💾 Project Files Description

Executable Files:

Source Directory:

📖 Goal:

📖 Dataset information:

📈 Exploratory Data Analysis

📖 Random Forest

📖 XGB Regressor

📈 Results

📖 Technologies Used::

📚 References

📜 Credits

About

Releases

Packages

Languages

pyhtonman0101/Rossmann-Sales-Prediction-

Folders and files

Latest commit

History

Repository files navigation

Rossmann Sales Prediction

💾 Problem Statement and Project Description

💾 Table of Content

💾 Project Files Description

Executable Files:

Source Directory:

📖 Goal:

📖 Dataset information:

📈 Exploratory Data Analysis

📖 Random Forest

📖 XGB Regressor

📈 Results

📖 Technologies Used::

📚 References

📜 Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages