Instacart Market Basket Analysis

Abstract

Instacart has emerged as a leading grocery delivery platform in North America and it's growth is fast accelerating due to COVID-19. Given the increase in online grocery shopping and as avid food enthusiasts, we wanted to understand what people are ordering and how their behaviour differs. After exploring the dataset and the different data points it contains, we identified the opportunity to focus on user and order behaviour when buying produce (fruits, vegetables, etc.).

Research Questions

1. What is the breakdown of customers that buy organic versus ones that never buy organic when given the option?

2. Does purchasing behavior differ between organic and never organic users?

3. Does user purchasing behavior change as they make more orders on Instacart? And is buying “organic” a stable or dynamic behavior?

Project Outcomes

In conclusion, given the analysis of our three research questions, we believe that the following insights can help us understand the purchasing behavior of produce among Instacart users:

Organic customers make up a large portion of users (>50% of purchased produce is organic)
Users that never buy organic produce (i.e., Never Organics) seem to shop differently than users that buy organic; they order less, more sporadically and when they do order, they buy less produce than organic buyers
- Super Organic users also order less produce per order, we hypothesize that they might be “pickier” buyers
As users make more orders, they seem to include produce items more often and to shift towards organic produce as well
Users in the Super Organic and Never Organic segments appear less stable than those in the Moderate Organic and Organic Taster segments. Super Organics migrate towards Moderate while Never Organics migrate towards Tasters

Tools Used

Languages:

Python

Libraries:

numpy
pandas
matplotlib

Research Dataset

The source dataset leveraged was from a machine-learning based Instacart Kaggle competition that contained a relational set of files describing Instacart customers' orders over time. For the purpose of the competition, the ordered products were split into the prior, train and test datasets, of which the test dataset was not publicly available. Source datasets: Instacart Market Basket Analysis (available on Kaggle)

Datasets leveraged for this analysis - aisles.csv, departments.csv, order_products__prior.csv, order_products__train.csv, orders.csv, products.csv

Note that the orders.csv contains information pertaining to each order as a whole, while each order_products__*.csv contains information about the basket of products within each order (one order can have multiple products).

Division of Labor

Credit for this project also goes to my team members Greg Tully and Faris Haddad of the UC Berkeley Masters of Information and Data Science program.

For the purposes of refraining from taking credit for my other dedicated group members work, each notebook is labelled by last name to denote contribution to the overall research project. Additionally, the work contributed by the other members of the research team has been seperately stored within the Other_analysis directory. The following notebooks have been labelled as my own contributions to the project:

Data_cleaning_Temlock.ipynb
Preprocressing_organic_segmentation_Temlock.ipynb
Question3_Temlock.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Other_analysis		Other_analysis
src		src
.gitignore		.gitignore
Project_2_Final_Report_Haddad_Tully_Temlock.pdf		Project_2_Final_Report_Haddad_Tully_Temlock.pdf
Project_2_Presentation_Haddad_Tully_Temlock.pdf		Project_2_Presentation_Haddad_Tully_Temlock.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instacart Market Basket Analysis

Abstract

Research Questions

Project Outcomes

Tools Used

Languages:

Libraries:

Research Dataset

Division of Labor

About

Releases

Packages

Languages

stemlock/Instacart_Market_Basket_Data_Analysis

Folders and files

Latest commit

History

Repository files navigation

Instacart Market Basket Analysis

Abstract

Research Questions

Project Outcomes

Tools Used

Languages:

Libraries:

Research Dataset

Division of Labor

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages