- Grocery Dataset
- Online Retail
Market Basket Analysis is the analysis of past buying behaviourof customers to find out which are the products that are bought together by the customers. That means to find out the association between various products. If the retail's management can find this association, while placing the products in the shop, these associated products can be put together. Or, when seeing that a customer is buying a product, the salesman can offer the associated product to the customer.
We find this association by Association Rule learning which is a machine-learning rule based approach that generates relationship between variables in a dataset. It has major application in retail industry including e-commerce.
To determine the association between various products in the basket by analysing the customer purchase pattern of multiple items.
Grocery data
Each row of data represents a transaction and the attributes the product purchased. For value 0 the attribute item has not been purchased, for value the attribute item has been purchased in that particular transaction
Online Retail
Each row of data represents a transaction for a particular item and the attributes correspond to the following:
InvoiceNo : Unique identifier for transaction
StockCode : Unique identifier for the stock item being purchased
Description : Description of item
Quantity : Number of units purchased
InvoiceDate : Date of purchase
UnitPrice : Cost of one unit of the item
CustomerID : Unique Identifier for customer
Country : Country of transaction
-
Importing Necessary Dependencies
-
Loading Data
-
Data Exploration and Visualization
-
Data Processing
- Data Cleaning
- Transforming data to one transaction per row
- One Hot Encoding of purchases made
-
Generating Association Rules
-
Refining the rules
In order to establish association rules between items we will be using the apriori algorithm which uses a bottomm-up approach where frequent items (items bought together) are extended one item at a time and groups of candidates are tested against the availbale dataset. The process continues until no further extensions are found. It uses the concept of Support, Confidence and Lift to establish the rules.
Rules which have a higher support and confidence than the predefined support and confidence are taken into account.
Support, confidence and Lift is given by:
Top Sold Items
Association Rules
Processed Data
Association Rules
Selected Rules
Grocery Dataset
The rules states that people who bought other vegetables are likely to purchase root vegetables and the Confidence of the rule is 46% which means 46% of the time people bought other vegetables they also bought root vegetables and the Lift for the given rule is 2.24 which means the probability of finding the root vegetables in the transactions having other vegetables is greater than the normal probability of finding the it had the two items been not associated. A lift value of 1 indicates absence of association between the two items.
Online Dataset
Understanding the rules for this dataset we see SET/6 RED SPOTTY PAPER PLATES has a confidence of 80% and lift of 6.03 with itemset SET/20 RED RETROSPOT PAPER NAPKINS which means 80% of the times when the latter item was bought SET/6 RED SPOTTY PAPER PLATES was also bought.