Data scientist in Kalbe Nutritionals got a new project from :
- Inventory Team to predict sum of quantity from all products, so they could create sufficient daily inventory.
- Marketing Team to create cluster or segment of customer to get personalized promotion and sales treatment
The challenges in this project include :
- Perform Data Ingestion into dbeaver and PostgreSQL.
- Perform Exploratory Data Analysis (EDA) to know the average age of customer based on marital status and gender, also the best-selling store and product name.
- Visualizing Data Dashboard using Tableau
- Make Machine Learning Regression Model (Time Series) with ARIMA
- Make Machine Learning Clustering Model with KMeans Algorithm
- Estimate the quantity of products sold for daily stock inventory
- Create a segment of customers for marketing promotion
On DBeaver we import dataset in Postgres public schema. There are 4 csv file that is used :
The challenge results are :
Query 1 : Average customer age based on their marital status
Query 2 : Average customer age based on their gender
Query 3 : Store name with the highest total quantity
Query 4 : The best-selling product with the highest total amount
Click here to get on Dashboard Visualization
The dashboard visualization on entire worksheet :
Worksheet 1 : Total quantity from month to month
Insight :
- Sales trends fluctuate slightly, and show a gradual decline starting from June.
- The highest total quantity sold is in March 2022 with 1,753 items
- The lowest total quantity sold is in December 2022 with 1,409 items
Worksheet 2 : Total amount from day to day
Insight :
- Daily revenue trends fluctuate heavily
- The highest total amount is in March 2022 with Rp 976,500
- The lowest total amount is in August 2022 with Rp 123,600
Worksheet 3 : Total quantity by product
Insight :
- The highest selling product in 2022 is Thai Tea with 2,853 items sold.
- The lowest selling product in 2022 is Cashew with 627 items sold.
Worksheet 4 : Total sales amount by store name
Insight :
- The best-selling store in 2022 is Lingga with sales revenue reached Rp 25,294,100.
- The lowest-selling store in 2022 is Buana Indah with sales revenue reached Rp 10,629,900.
Click here to get on ARIMA and KMeans Clustering
Fix data type error
Fix data type error for 'Latitude', 'Longitude', and 'Income' from coma (,) to point (.)
Before cleaning :
After Cleaning :
Drop Missing Values :
Drop missing values on 'Marital Status' for 3 row
Merging all dataset into one data :
Create new dataset by grouping date and total quantity :
Here is the plot of new dataset :
According to trend, seasonality, and residual, the sales trend are quite fluctuative and indicate downtrend.
Based on ADF statistic, the p-value is 0.0 < 0.05, then reject H0 and accept H1. Therefore data is stationary.
Splitting the data with 80% training and 20% testing. Blue line is data training, and green line is data testing.
Model 1 - Auto-fit ARIMA
Get result with p,d,q = 1,0,1. ARIMA (1,0,1) means there is no Differencing (0) because it is stationary, with Autoregression for 1 lag and 1 order Moving Average.
Model 2 - ACF & PACF Plot
Get result with p,d,q = 28,0,28 from ACF and PACF plot.
Model 3 - Autocorrelation Plot
Get result with p,d,q = 44,0,44 from Autocorrelation plot.
ARIMA Modelling Plot
Plot Data Train, Test, and Model Prediction
Forecast Quantity Sales With The Best Parameter
Model 2 with p,d,q (28,0,28) show the best metric evaluation because has the lowest MAE, MSE, RMSE, and MAPE.
Prediction for quantity on January 2023 is 50 pcs/day
Find k =4 in Elbow line plot, and input cluster to dataset.
There are 4 cluster of segmentation :
Conclusion
- Cluster 0 is the cluster with the most largest number of customers, but has the second lowest average of quantity and total amount. The strategy is give special offering and discount for new member
- Cluster 1 is the cluster with the second fewest number of customers, and the lowest average of quantity and total amount. One of the strategy is collaborate with influencers to promote products.
- Cluster 2 is the customer that valuable to the business. the strategy is offer loyalty membership
- Cluster 3 has the second largest average of quantity and total amount and has potential of upselling.