Skip to content

Latest commit

 

History

History
2896 lines (2792 loc) · 104 KB

Predict-Brand.md

File metadata and controls

2896 lines (2792 loc) · 104 KB
title author date output
Consumer Brand Preference Predictions
Jennifer Brosnahan
9/12/2020
html_document
keep_md theme highlight
true
lumen
haddock

Background

An Electronics Company would like for us to predict which computer brand customers from a new region will prefer, Acer or Sony. Doing so will help to determine inventory needs for new clientele and individual customer brand preferences for robust in-store and online marketing efforts.

Objective

The objective is to build predictive models and choose a model that can predict consumer computer brand preferene with at least 80% accuracy on test data. Ideal goal is a model that can predict brand preference with at least 90% level of certainty.

Load libraries

# Loading packages
library(tidyverse)
library(caret)
library(ggplot2)
library(corrplot)
library(openxlsx)
library(h2o)
library(knitr)
library(kableExtra)

Import data

# training data
complete <- read.csv(file.path('C:/Users/jlbro/OneDrive/C3T2/C3T2', 'complete.csv'), stringsAsFactors = TRUE)

# testing data
incomplete <- read.csv(file.path('C:/Users/jlbro/OneDrive/C3T2/C3T2', 'incomplete.csv'), stringsAsFactors = TRUE)

Understand data

# check structure of training data
str(complete)
## 'data.frame':	9898 obs. of  7 variables:
##  $ salary : num  119807 106880 78021 63690 50874 ...
##  $ age    : int  45 63 23 51 20 56 24 62 29 41 ...
##  $ elevel : int  0 1 0 3 3 3 4 3 4 1 ...
##  $ car    : int  14 11 15 6 14 14 8 3 17 5 ...
##  $ zipcode: int  4 6 2 5 4 3 5 0 0 4 ...
##  $ credit : num  442038 45007 48795 40889 352951 ...
##  $ brand  : int  0 1 0 1 0 1 1 1 0 1 ...

Observations

  • 9898 total observations with 7 total features
  • Six features are customer description variables, one feature is consumer brand preference, 'brand'
  • The data description tells us the codes for brand: Acer=0 and Sony=1
# check descriptive stats
summary(complete)
##      salary            age            elevel           car       
##  Min.   : 20000   Min.   :20.00   Min.   :0.000   Min.   : 1.00  
##  1st Qu.: 52082   1st Qu.:35.00   1st Qu.:1.000   1st Qu.: 6.00  
##  Median : 84950   Median :50.00   Median :2.000   Median :11.00  
##  Mean   : 84871   Mean   :49.78   Mean   :1.983   Mean   :10.52  
##  3rd Qu.:117162   3rd Qu.:65.00   3rd Qu.:3.000   3rd Qu.:15.75  
##  Max.   :150000   Max.   :80.00   Max.   :4.000   Max.   :20.00  
##     zipcode          credit           brand       
##  Min.   :0.000   Min.   :     0   Min.   :0.0000  
##  1st Qu.:2.000   1st Qu.:120807   1st Qu.:0.0000  
##  Median :4.000   Median :250607   Median :1.0000  
##  Mean   :4.041   Mean   :249176   Mean   :0.6217  
##  3rd Qu.:6.000   3rd Qu.:374640   3rd Qu.:1.0000  
##  Max.   :8.000   Max.   :500000   Max.   :1.0000
summary(complete$brand)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  1.0000  0.6217  1.0000  1.0000
# histogram of 'brand'
ggplot(complete) +
  geom_histogram(aes(brand), stat = 'count', bins = 2)

Binary classification is slightly imbalanced, 62/38

Preprocessing

# check for NAs
sum(is.na(complete))
## [1] 0
# change data type and values
complete$brand <- as.factor(complete$brand)

str(complete)
## 'data.frame':	9898 obs. of  7 variables:
##  $ salary : num  119807 106880 78021 63690 50874 ...
##  $ age    : int  45 63 23 51 20 56 24 62 29 41 ...
##  $ elevel : int  0 1 0 3 3 3 4 3 4 1 ...
##  $ car    : int  14 11 15 6 14 14 8 3 17 5 ...
##  $ zipcode: int  4 6 2 5 4 3 5 0 0 4 ...
##  $ credit : num  442038 45007 48795 40889 352951 ...
##  $ brand  : Factor w/ 2 levels "0","1": 1 2 1 2 1 2 2 2 1 2 ...

Modeling

set.seed(123)
# createDataPartition() 75% and 25%
index1 <- createDataPartition(complete$brand, p=0.75, list = FALSE)
train1 <- complete[ index1,]
test1 <- complete[-index1,]

# Check structure of trainSet
str(train1)
## 'data.frame':	7424 obs. of  7 variables:
##  $ salary : num  119807 106880 78021 63690 130813 ...
##  $ age    : int  45 63 23 51 56 24 62 29 48 52 ...
##  $ elevel : int  0 1 0 3 3 4 3 4 4 1 ...
##  $ car    : int  14 11 15 6 14 8 3 17 16 6 ...
##  $ zipcode: int  4 6 2 5 3 5 0 0 5 0 ...
##  $ credit : num  442038 45007 48795 40889 135943 ...
##  $ brand  : Factor w/ 2 levels "0","1": 1 2 1 2 2 2 2 1 2 1 ...
# set cross validation
control <- trainControl(method = 'repeatedcv', 
                        number=10, 
                        repeats = 1)

Random Forest using automatic tuning

set.seed(123)

# train algorithm
rf1 <- train(brand~.,
             data = train1,
             method = 'rf',
             trControl = control,
             tuneLength = 1) 

rf1
## Random Forest 
## 
## 7424 samples
##    6 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times) 
## Summary of sample sizes: 6681, 6681, 6681, 6682, 6681, 6683, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.9172925  0.8246223
## 
## Tuning parameter 'mtry' was held constant at a value of 2

Random forest using manual tuning

set.seed(123)
# train 
rf2 <- train(brand~.,
             data = train1,
             method = 'rf',
             trControl=control,
             tuneLength = 5) 

rf2
## Random Forest 
## 
## 7424 samples
##    6 predictor
##    2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times) 
## Summary of sample sizes: 6681, 6681, 6681, 6682, 6681, 6683, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   2     0.9176981  0.8252764
##   3     0.9176985  0.8252930
##   4     0.9171598  0.8240410
##   5     0.9155429  0.8203682
##   6     0.9132536  0.8155551
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.

Observations

  • Both Random forest models are about the same. We will use Random Forest 1 with an mtry of 2 and 91.8% accuracy, as it is simpler and faster model and meets client goal of at least 80% accuracy.
  • Kappa is 82.5%, which is a more useful determination of accuracy if predicted class is imbalanced, as it helps normalize an imbalance in the classes.
# variable importance using ggplot
ggplot(varImp(rf1, scale=FALSE)) +
  geom_bar(stat = 'identity') +
  ggtitle('Variable Importance of Top Random Forest Model')

Predict on test dataset using optimal model

rfPreds <- predict(rf1, newdata = test1)
# predict using type = 'prob' helps see prediction for each observation
rfProbs <- predict(rf1, newdata = test1, type = 'prob')
head(rfProbs, 10)
##        0     1
## 5  0.600 0.400
## 10 0.092 0.908
## 14 0.918 0.082
## 30 0.040 0.960
## 31 0.004 0.996
## 38 0.172 0.828
## 39 0.496 0.504
## 45 0.646 0.354
## 55 0.194 0.806
## 56 0.926 0.074

Confusion matrix of top model

confusionMatrix(data = rfPreds, test1$brand)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0  861  108
##          1   75 1430
##                                         
##                Accuracy : 0.926         
##                  95% CI : (0.915, 0.936)
##     No Information Rate : 0.6217        
##     P-Value [Acc > NIR] : < 2e-16       
##                                         
##                   Kappa : 0.8438        
##                                         
##  Mcnemar's Test P-Value : 0.01801       
##                                         
##             Sensitivity : 0.9199        
##             Specificity : 0.9298        
##          Pos Pred Value : 0.8885        
##          Neg Pred Value : 0.9502        
##              Prevalence : 0.3783        
##          Detection Rate : 0.3480        
##    Detection Prevalence : 0.3917        
##       Balanced Accuracy : 0.9248        
##                                         
##        'Positive' Class : 0             
## 

Observation

  • Confusion matrix shows 92.6% accuracy, 92% sensitivity (% of positives we are catching), and 93% specificity (% of negatives we are catching). This is a good model.
# postResample reveals if it will do well in real world or if it is overfitting
postResample(rfPreds, test1$brand)
##  Accuracy     Kappa 
## 0.9260307 0.8438283

rf1 is not overfitting, accuracy and kappas are both comparable and high

# compare predictions to actual within same data frame
compare_rf <- data.frame(test1,rfPreds) 
head(compare_rf, 100)
##        salary age elevel car zipcode     credit brand rfPreds
## 5    50873.62  20      3  14       4 352951.498     0       0
## 10   37803.33  41      1   5       4 493219.269     1       1
## 14   82474.58  33      4  13       3 424657.497     0       0
## 30  107710.32  75      0  16       2 209002.407     1       1
## 31  148495.44  62      2   9       1 377495.295     1       1
## 38   49692.10  42      4   8       2 462826.149     1       1
## 39  101454.63  26      0  18       2  22960.800     0       1
## 45   87634.05  59      2   5       6 206711.940     1       0
## 55   81015.26  63      0   7       7 250036.000     1       1
## 56   88405.44  36      4   8       7 195042.678     0       0
## 63   83185.07  62      4   2       8 163108.438     1       0
## 64   31430.24  47      4  10       2 177297.720     1       1
## 70  131957.81  65      4  11       1 238253.007     1       1
## 74  131927.71  63      1   8       1 424565.963     1       1
## 75  110539.06  64      4  11       6 257476.590     1       1
## 76  133162.37  69      0   7       8 413996.477     1       1
## 81  137368.94  57      4  12       8 415156.282     1       1
## 82   43377.96  30      0   6       2 190336.160     1       1
## 83   85919.99  75      3   7       1  43412.866     1       1
## 89   47399.24  21      4   2       3 362188.027     1       1
## 97   34759.48  58      4  16       1 396818.591     1       1
## 98  141628.04  75      2  18       3 326686.025     1       1
## 109  27102.74  46      0  16       2 293842.154     1       1
## 116  96839.45  53      1  18       4 393853.363     0       0
## 118 100820.01  70      4   4       6 463076.230     1       1
## 119 129274.99  29      2  12       0  98820.083     1       1
## 124  43727.81  43      3  19       1 199837.255     1       1
## 130  23964.83  57      1  17       4 165441.501     1       1
## 132 134663.82  51      1   2       1 476990.362     1       1
## 133  27705.56  41      4  16       5 213908.798     1       1
## 134  68420.38  26      4  14       2 485034.815     0       0
## 136  45257.89  40      2  11       4 374850.874     1       1
## 140 109830.17  24      1   9       8 144156.978     1       1
## 143  58089.91  45      2   9       5 460439.314     1       1
## 144 138483.51  34      4  17       7 158932.977     1       1
## 147 137794.61  24      0  13       2 216481.904     1       1
## 151 111872.72  41      0   8       1  13674.562     0       0
## 153 133279.78  41      2  15       6 260987.137     1       1
## 158 106841.00  78      3   2       5 386222.361     1       1
## 161  92422.59  52      2   8       2 308886.506     0       0
## 163  66938.52  47      1  11       8  15198.151     1       1
## 169 119888.63  41      0   3       6 328779.040     0       0
## 170 128751.15  66      0   2       0 394010.991     1       1
## 175 132530.65  25      1   5       4 145812.348     1       1
## 176  23520.41  72      2   8       7 313318.661     0       1
## 177  29819.80  57      4   4       4 186745.141     1       1
## 179  68054.64  61      1  20       3 277300.510     0       0
## 182 130455.16  56      4  20       1  92299.373     1       1
## 190 101445.01  23      2  15       2 243688.702     1       1
## 192  54661.20  32      0  18       7 168218.382     0       0
## 196 103158.63  44      1   5       0 244790.504     0       0
## 197  58555.51  62      2  19       3 389687.140     0       0
## 204  48777.81  28      0  15       7 500000.000     0       1
## 206 127367.22  78      0  11       2 252011.588     1       1
## 207 121775.83  30      1  11       8      0.000     1       1
## 208 138178.88  53      4  20       3 102848.193     1       1
## 216  68366.52  43      4  14       2 355413.199     1       1
## 219 130119.76  46      0   1       1 277130.032     1       1
## 229  96160.32  61      2   6       3 328170.531     1       0
## 230  61926.03  62      2   2       2 353903.981     0       0
## 231  89248.67  42      4  15       0 453443.387     0       0
## 233  73906.88  32      4  17       8 499164.343     0       0
## 234  30284.09  22      4  13       7 280717.295     1       1
## 238  75378.27  49      0  19       6  92529.983     1       0
## 244  51667.39  70      2   1       8 347967.023     0       0
## 245 127191.59  39      0  18       1 166839.029     1       1
## 250  31327.02  37      3  14       5 244495.552     1       1
## 253  59217.18  58      0  10       2 173339.583     0       1
## 259 104663.80  77      1  19       2 309124.886     1       1
## 261 140924.81  80      2   3       0 299265.761     1       1
## 264 106795.67  58      2   4       8 137897.615     1       0
## 270 115429.48  57      0   1       4 145421.506     0       0
## 274  52866.81  73      1   6       5  61389.592     0       0
## 276  45116.39  28      4  16       6 281700.230     1       1
## 278 127504.25  47      4   6       5 422383.878     1       1
## 279  35821.32  58      2  11       3 327520.885     1       1
## 281 150000.00  80      1  19       3 249547.659     1       1
## 287  65666.11  77      0   8       4  11105.343     0       0
## 292  86242.63  22      4  10       4 206786.427     0       0
## 296 100577.60  35      2   8       7 288933.139     1       1
## 305  79986.35  49      3   2       6 500000.000     0       0
## 306  44697.96  70      3   3       2 195563.429     0       0
## 307  32394.89  77      4   7       5 383085.772     0       0
## 308 143742.35  80      4  11       8 129507.011     1       1
## 310 138335.93  50      1   6       6      0.000     1       1
## 313 111228.29  75      2   9       7 390871.986     1       1
## 315  43943.42  78      3  10       2 140100.814     0       0
## 316 103863.52  44      4  17       5 120606.688     0       0
## 320  35888.78  59      2  14       0  16701.637     0       0
## 327  43465.95  79      3  11       4 305635.722     0       0
## 334 126044.39  26      2   2       1 113170.066     1       1
## 336 116184.32  57      4  20       5 472757.667     0       0
## 339 142443.02  80      3  14       2 217056.567     1       1
## 342 141371.40  57      2  20       3   6758.351     1       1
## 345  41864.34  60      3  15       5  58115.384     1       0
## 353 135945.74  50      3  13       1 238705.912     1       1
## 354  96689.22  54      2  10       3 487481.529     0       0
## 355  68762.77  23      3   5       2 114843.394     0       0
## 356 122551.72  49      4   6       7 302519.882     0       0
## 357  75542.85  43      2  20       4 282457.597     0       0
# summarize and plot
summary(rfPreds)
##    0    1 
##  969 1505
ggplot(compare_rf) +
  geom_histogram(aes(rfPreds), stat = 'count', bins = 2) +
  xlab('Brand Preference Predictions') +
  ggtitle('Distribution of Brand Preference Predictions')

Model predicted on test data similar ratio of actual brand preference (38% Acer to 62% Sony)

Predict brand preference on new customer dataset: 'incomplete'

# check structure of new dataset
str(incomplete)
## 'data.frame':	5000 obs. of  7 variables:
##  $ salary : num  150000 82524 115647 141443 149211 ...
##  $ age    : int  76 51 34 22 56 26 64 50 26 46 ...
##  $ elevel : int  1 1 0 3 0 4 3 3 2 3 ...
##  $ car    : int  3 8 10 18 5 12 1 9 3 18 ...
##  $ zipcode: int  3 3 2 2 3 1 2 0 4 6 ...
##  $ credit : num  377980 141658 360980 282736 215667 ...
##  $ brand  : int  1 0 1 1 1 1 1 1 1 0 ...

Preprocess

# check for NAs
sum(is.na(incomplete))
## [1] 0
# change data type
incomplete$brand <- as.factor(incomplete$brand)

# check structure and summary
str(incomplete)
## 'data.frame':	5000 obs. of  7 variables:
##  $ salary : num  150000 82524 115647 141443 149211 ...
##  $ age    : int  76 51 34 22 56 26 64 50 26 46 ...
##  $ elevel : int  1 1 0 3 0 4 3 3 2 3 ...
##  $ car    : int  3 8 10 18 5 12 1 9 3 18 ...
##  $ zipcode: int  3 3 2 2 3 1 2 0 4 6 ...
##  $ credit : num  377980 141658 360980 282736 215667 ...
##  $ brand  : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 1 ...
summary(incomplete)
##      salary            age            elevel           car      
##  Min.   : 20000   Min.   :20.00   Min.   :0.000   Min.   : 1.0  
##  1st Qu.: 52590   1st Qu.:35.00   1st Qu.:1.000   1st Qu.: 6.0  
##  Median : 86221   Median :50.00   Median :2.000   Median :11.0  
##  Mean   : 85794   Mean   :49.94   Mean   :2.009   Mean   :10.6  
##  3rd Qu.:118535   3rd Qu.:65.00   3rd Qu.:3.000   3rd Qu.:16.0  
##  Max.   :150000   Max.   :80.00   Max.   :4.000   Max.   :20.0  
##     zipcode          credit       brand   
##  Min.   :0.000   Min.   :     0   0:4937  
##  1st Qu.:2.000   1st Qu.:122311   1:  63  
##  Median :4.000   Median :250974           
##  Mean   :4.038   Mean   :249546           
##  3rd Qu.:6.000   3rd Qu.:375653           
##  Max.   :8.000   Max.   :500000

Observation

  • Summary reveals some brands are answered as 1
  • Deeper dive of dataset unveils that first 102 rows of brand have been filled in
  • The rest (103:5000) are unanswered and need to be predicted
# make brand predictions on new customer data 'incomplete'
incompletePreds <- data.frame(predict(rf1, newdata = incomplete))
names(incompletePreds)[1] <- 'Predictions'
str(incompletePreds)
## 'data.frame':	5000 obs. of  1 variable:
##  $ Predictions: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 1 ...
# postResample on first 102 observations to determine how well model did on test df
subset_incomplete <- incomplete %>% slice(1:102)
subset_PR <- incompletePreds %>% slice(1:102)
postResample(subset_PR, subset_incomplete$brand)
##  Accuracy     Kappa 
## 0.8529412 0.6808511

Observation

  • Accuracy is high at 85.3%, kappa is lower, perhaps because of small sample size
# compare first 102 actual observations to predictions
bind_cols(subset_incomplete$brand, subset_PR)
## New names:
## * NA -> ...1
##     ...1 Predictions
## 1      1           1
## 2      0           0
## 3      1           1
## 4      1           1
## 5      1           1
## 6      1           1
## 7      1           1
## 8      1           1
## 9      1           1
## 10     0           0
## 11     0           0
## 12     0           0
## 13     1           1
## 14     0           1
## 15     1           1
## 16     1           1
## 17     0           0
## 18     1           1
## 19     1           1
## 20     0           0
## 21     0           0
## 22     1           1
## 23     0           1
## 24     0           1
## 25     1           1
## 26     1           1
## 27     1           0
## 28     1           1
## 29     1           1
## 30     1           1
## 31     1           1
## 32     0           0
## 33     1           1
## 34     1           1
## 35     1           1
## 36     1           1
## 37     0           1
## 38     1           1
## 39     1           1
## 40     1           1
## 41     1           1
## 42     0           1
## 43     1           1
## 44     1           1
## 45     1           1
## 46     1           1
## 47     1           1
## 48     0           0
## 49     0           0
## 50     0           0
## 51     0           0
## 52     1           1
## 53     0           1
## 54     1           1
## 55     1           1
## 56     1           1
## 57     0           0
## 58     1           1
## 59     0           0
## 60     0           1
## 61     1           1
## 62     1           1
## 63     1           1
## 64     1           0
## 65     1           1
## 66     0           0
## 67     1           1
## 68     0           1
## 69     0           0
## 70     0           0
## 71     1           1
## 72     1           0
## 73     1           1
## 74     0           0
## 75     0           0
## 76     1           1
## 77     1           1
## 78     1           1
## 79     1           0
## 80     0           0
## 81     0           0
## 82     1           1
## 83     0           0
## 84     1           1
## 85     1           0
## 86     0           1
## 87     1           1
## 88     0           1
## 89     1           1
## 90     0           0
## 91     0           0
## 92     0           0
## 93     0           0
## 94     0           0
## 95     0           0
## 96     1           1
## 97     1           1
## 98     0           0
## 99     1           1
## 100    1           1
## 101    1           1
## 102    1           1
# summarize and plot predictions
summary(incompletePreds)
##  Predictions
##  0:1893     
##  1:3107
ggplot(incompletePreds) +
  geom_histogram(aes(Predictions), stat = 'count', bins = 2)

Observation

  • Ratio of distribution of predictions is very similar to training and testing: 38% predicted to prefer Acer and 62% predicted to prefer Sony
# predictions in new data
compare_incomplete <- data.frame(incomplete,incompletePreds) 
head(compare_incomplete, 150)
##        salary age elevel car zipcode    credit brand Predictions
## 1   150000.00  76      1   3       3 377980.10     1           1
## 2    82523.84  51      1   8       3 141657.61     0           0
## 3   115646.64  34      0  10       2 360980.36     1           1
## 4   141443.39  22      3  18       2 282736.32     1           1
## 5   149211.27  56      0   5       3 215667.29     1           1
## 6    46202.25  26      4  12       1 150419.40     1           1
## 7   125821.24  64      3   1       2 173429.39     1           1
## 8    20141.14  50      3   9       0 447716.53     1           1
## 9   135261.85  26      2   3       4 223821.22     1           1
## 10   83273.93  46      3  18       6 213961.37     0           0
## 11   51475.90  66      0  15       6  82059.50     0           0
## 12   60787.32  76      0  18       4 249276.46     0           0
## 13   61494.52  44      0  14       4 205624.59     1           1
## 14  125904.77  42      3   7       4 459119.17     0           1
## 15   87232.20  80      2   3       4 199156.53     1           1
## 16   28966.72  22      3  16       4 147625.70     1           1
## 17   82393.18  30      3  18       2 418293.00     0           0
## 18   27724.52  37      0   9       1 431552.11     1           1
## 19   94998.07  73      1  20       8 352723.33     1           1
## 20   56005.12  35      0  14       3 205351.36     0           0
## 21   70241.27  62      0  12       8 330532.14     0           0
## 22   91852.88  69      4   9       5 193700.98     1           1
## 23   47920.93  40      4  17       2 258034.24     0           1
## 24  125879.93  50      0  10       2 123780.89     0           1
## 25  105885.11  62      2  10       4  37183.54     1           1
## 26  140824.94  72      4  20       0 411726.02     1           1
## 27   28483.93  70      3  14       4 340911.01     1           0
## 28   58353.19  46      1  19       1 125476.82     1           1
## 29  116269.45  67      1   6       0 273318.21     1           1
## 30   36971.40  56      4  10       7  16307.77     1           1
## 31   29183.09  33      4   3       3  20704.64     1           1
## 32   40807.21  63      4  11       8 382144.47     0           0
## 33   89954.32  67      2  14       1 350530.05     1           1
## 34   23478.99  57      4   5       2 200986.73     1           1
## 35  118374.98  29      4  10       6 105954.54     1           1
## 36  138330.49  20      0   4       1 446821.54     1           1
## 37   21345.30  58      2  17       1 229348.26     0           1
## 38  149459.65  67      0  17       0 396511.17     1           1
## 39  103910.39  70      2  20       7 229000.75     1           1
## 40   24663.77  61      4  15       6 407435.55     1           1
## 41   23043.77  55      0   2       6 471847.02     1           1
## 42  104313.06  39      2  14       6 449116.26     0           1
## 43   86609.84  72      0   7       1 381696.70     1           1
## 44  136232.75  77      4  16       8 133090.01     1           1
## 45  116754.00  61      1  13       2 430749.29     1           1
## 46  129412.07  54      2   6       8 305531.03     1           1
## 47  125083.79  68      1  17       6 258844.54     1           1
## 48   78782.08  45      4   1       5   6240.85     0           0
## 49   60337.24  75      1  19       3 326574.84     0           0
## 50   80167.41  50      3  17       4 157123.72     0           0
## 51   86721.10  56      4  17       5  43340.83     0           0
## 52   56468.84  48      2   8       2 243646.34     1           1
## 53   69616.74  58      2   9       5 419945.18     0           1
## 54  147166.38  61      4   2       0 397447.39     1           1
## 55  130543.49  37      3   8       3 239709.04     1           1
## 56   83138.88  66      3  12       4 489417.94     1           1
## 57   35455.15  74      3  16       5 134287.44     0           0
## 58  121516.56  22      0  20       7 150662.23     1           1
## 59   88058.88  30      1   1       6 397398.48     0           0
## 60   73174.23  44      1   1       3 225058.20     0           1
## 61   99548.13  24      4  20       0 258620.92     1           1
## 62   57385.88  48      1  17       1  97202.55     1           1
## 63  130638.11  43      1  19       4 472569.38     1           1
## 64  103270.80  38      0   6       0  82446.66     1           0
## 65   53190.17  51      2   5       7      0.00     1           1
## 66   85553.38  34      0   7       0 146998.36     0           0
## 67  110068.36  66      3   4       1  17685.91     1           1
## 68  124031.25  52      3   5       6      0.00     0           1
## 69   95841.68  54      1  19       8 469014.49     0           0
## 70   66201.72  38      2  18       6 423150.07     0           0
## 71  111837.75  73      4  16       0 327286.27     1           1
## 72   94928.86  20      1  19       8 337952.90     1           0
## 73  145613.88  74      0  11       1 386122.71     1           1
## 74   54744.78  72      2  18       2 238827.56     0           0
## 75  116250.86  41      2  20       8  63761.49     0           0
## 76  128098.86  76      4  16       2  84307.29     1           1
## 77  131505.30  51      2   6       3 424148.52     1           1
## 78  103473.38  75      0  11       0  62123.99     1           1
## 79  118932.14  41      3  16       6 465578.02     1           0
## 80   46620.67  72      4  19       3 252582.28     0           0
## 81   80746.76  33      0  13       4 420476.92     0           0
## 82  111334.90  32      4   3       7 384233.39     1           1
## 83   71933.45  63      0   1       1 433606.15     0           0
## 84  112042.06  68      3   1       8  80403.31     1           1
## 85   74222.61  52      1  11       1 348048.09     1           0
## 86   21070.43  67      2  16       2  22896.27     0           1
## 87  115647.13  64      2  17       3 411999.49     1           1
## 88  124255.99  43      3  11       4 235693.47     0           1
## 89   24228.85  72      3   5       0 347027.23     1           1
## 90   42176.58  63      3  20       2 185901.91     0           0
## 91   47377.00  67      2   2       3 455074.87     0           0
## 92   51763.11  75      3   8       1  56667.39     0           0
## 93  103896.05  46      0   9       3 425032.80     0           0
## 94   79496.43  34      0  13       0 363107.54     0           0
## 95   83138.10  22      4  13       4 159897.04     0           0
## 96  143929.62  22      0  12       4 259574.36     1           1
## 97   23235.85  75      4  17       6 375881.49     1           1
## 98   92428.27  37      2   5       3 255023.91     0           0
## 99   92539.75  80      4   3       7  45027.70     1           1
## 100  84296.09  72      2   8       7 131900.47     1           1
## 101 129891.41  32      1   2       7 213871.10     1           1
## 102 112032.61  77      1  13       4      0.00     1           1
## 103  64788.45  79      4   5       1  77918.10     0           0
## 104 135639.88  27      4  19       6  19186.18     0           1
## 105 105017.64  78      0   8       0  60901.15     0           1
## 106  30651.28  70      3   4       6 274973.34     0           0
## 107 124914.89  20      3  14       0 422490.72     0           1
## 108  51554.60  43      1   6       8 403873.21     0           1
## 109  41756.59  30      0   9       8 261237.01     0           1
## 110  40777.18  24      3  18       2 389602.49     0           1
## 111  22852.18  20      3  19       3 471134.00     0           1
## 112  46972.72  58      4   1       3 202696.61     0           1
## 113  81360.09  57      2   1       5 464026.01     0           0
## 114 118399.67  52      3   2       2 349837.50     0           0
## 115  97798.00  51      1  17       0 322701.69     0           0
## 116  34013.23  37      4   2       7  35509.87     0           1
## 117  39825.38  67      0   8       4 449697.25     0           0
## 118  42887.81  68      0   8       5 311664.92     0           0
## 119 136958.06  62      3   1       2 471262.04     0           1
## 120  73966.94  28      1   4       6 427243.06     0           0
## 121  58996.46  43      2   9       4 327786.82     0           1
## 122 147256.23  54      0   7       1 494074.74     0           1
## 123 137604.47  80      2  17       1 129093.43     0           1
## 124  67246.30  70      1   7       4 241223.87     0           0
## 125 111248.54  53      1  18       4 464879.92     0           0
## 126 138762.62  27      3  18       7 270868.15     0           1
## 127 130104.22  57      0   3       0      0.00     0           1
## 128  66419.40  70      1   1       6 430196.55     0           0
## 129 105905.17  78      1  14       4 221805.03     0           1
## 130  66718.69  22      3   3       0 359392.59     0           0
## 131 105371.38  28      0  17       3 118819.02     0           1
## 132 150000.00  20      3   8       1  52783.80     0           1
## 133  78824.05  70      1  13       6  36615.61     0           1
## 134 104203.67  20      0   5       4 472282.73     0           1
## 135  81031.09  20      3   7       8 281226.60     0           0
## 136  35394.28  26      1  18       4 461495.78     0           1
## 137  93010.35  44      4  14       8 500000.00     0           0
## 138  89977.25  29      2  13       7 119006.38     0           0
## 139  71357.39  62      0  18       3 413141.14     0           0
## 140  86564.45  33      2   9       5 477016.95     0           0
## 141 141029.82  22      2  10       8 166924.91     0           1
## 142 104962.64  45      4  10       1 344391.50     0           0
## 143  91708.12  50      3  11       8 446718.26     0           0
## 144  79017.63  24      1  11       1 182439.18     0           0
## 145 126987.44  77      3  10       8 276517.61     0           1
## 146  50959.12  69      2  11       4 158345.53     0           0
## 147  97361.54  52      2   2       1 472890.20     0           0
## 148  57871.07  27      4  18       5  40516.26     0           0
## 149 114719.99  35      2   1       6 371147.62     0           1
## 150 127173.60  33      2   5       7 278513.53     0           1

View first 200 predictions of consumer brand preference on new dataset 'incomplete'

kable(compare_incomplete[1:200,], format = 'html', caption = 'Brand Preference Predictions on New Customer Dataset', digits=3) %>% kable_styling(bootstrap_options = 'striped', full_width = FALSE)
Brand Preference Predictions on New Customer Dataset
salary age elevel car zipcode credit brand Predictions
150000.00 76 1 3 3 377980.10 1 1
82523.84 51 1 8 3 141657.61 0 0
115646.64 34 0 10 2 360980.36 1 1
141443.39 22 3 18 2 282736.32 1 1
149211.27 56 0 5 3 215667.29 1 1
46202.25 26 4 12 1 150419.40 1 1
125821.24 64 3 1 2 173429.39 1 1
20141.14 50 3 9 0 447716.53 1 1
135261.85 26 2 3 4 223821.22 1 1
83273.93 46 3 18 6 213961.37 0 0
51475.90 66 0 15 6 82059.50 0 0
60787.32 76 0 18 4 249276.46 0 0
61494.52 44 0 14 4 205624.59 1 1
125904.77 42 3 7 4 459119.17 0 1
87232.20 80 2 3 4 199156.53 1 1
28966.72 22 3 16 4 147625.70 1 1
82393.18 30 3 18 2 418293.00 0 0
27724.52 37 0 9 1 431552.11 1 1
94998.07 73 1 20 8 352723.33 1 1
56005.12 35 0 14 3 205351.36 0 0
70241.27 62 0 12 8 330532.14 0 0
91852.88 69 4 9 5 193700.98 1 1
47920.93 40 4 17 2 258034.24 0 1
125879.93 50 0 10 2 123780.89 0 1
105885.11 62 2 10 4 37183.54 1 1
140824.94 72 4 20 0 411726.02 1 1
28483.93 70 3 14 4 340911.01 1 0
58353.19 46 1 19 1 125476.82 1 1
116269.45 67 1 6 0 273318.21 1 1
36971.40 56 4 10 7 16307.77 1 1
29183.09 33 4 3 3 20704.64 1 1
40807.21 63 4 11 8 382144.47 0 0
89954.32 67 2 14 1 350530.05 1 1
23478.99 57 4 5 2 200986.73 1 1
118374.98 29 4 10 6 105954.54 1 1
138330.49 20 0 4 1 446821.54 1 1
21345.30 58 2 17 1 229348.26 0 1
149459.65 67 0 17 0 396511.17 1 1
103910.39 70 2 20 7 229000.75 1 1
24663.77 61 4 15 6 407435.55 1 1
23043.77 55 0 2 6 471847.02 1 1
104313.06 39 2 14 6 449116.26 0 1
86609.84 72 0 7 1 381696.70 1 1
136232.75 77 4 16 8 133090.01 1 1
116754.00 61 1 13 2 430749.29 1 1
129412.07 54 2 6 8 305531.03 1 1
125083.79 68 1 17 6 258844.54 1 1
78782.08 45 4 1 5 6240.85 0 0
60337.24 75 1 19 3 326574.84 0 0
80167.41 50 3 17 4 157123.72 0 0
86721.10 56 4 17 5 43340.83 0 0
56468.84 48 2 8 2 243646.34 1 1
69616.74 58 2 9 5 419945.18 0 1
147166.38 61 4 2 0 397447.39 1 1
130543.49 37 3 8 3 239709.04 1 1
83138.88 66 3 12 4 489417.94 1 1
35455.15 74 3 16 5 134287.44 0 0
121516.56 22 0 20 7 150662.23 1 1
88058.88 30 1 1 6 397398.48 0 0
73174.23 44 1 1 3 225058.20 0 1
99548.13 24 4 20 0 258620.92 1 1
57385.88 48 1 17 1 97202.55 1 1
130638.11 43 1 19 4 472569.38 1 1
103270.80 38 0 6 0 82446.66 1 0
53190.17 51 2 5 7 0.00 1 1
85553.38 34 0 7 0 146998.36 0 0
110068.36 66 3 4 1 17685.91 1 1
124031.25 52 3 5 6 0.00 0 1
95841.68 54 1 19 8 469014.49 0 0
66201.72 38 2 18 6 423150.07 0 0
111837.75 73 4 16 0 327286.27 1 1
94928.86 20 1 19 8 337952.90 1 0
145613.88 74 0 11 1 386122.71 1 1
54744.78 72 2 18 2 238827.56 0 0
116250.86 41 2 20 8 63761.49 0 0
128098.86 76 4 16 2 84307.29 1 1
131505.30 51 2 6 3 424148.52 1 1
103473.38 75 0 11 0 62123.99 1 1
118932.14 41 3 16 6 465578.02 1 0
46620.67 72 4 19 3 252582.28 0 0
80746.76 33 0 13 4 420476.92 0 0
111334.90 32 4 3 7 384233.39 1 1
71933.45 63 0 1 1 433606.15 0 0
112042.06 68 3 1 8 80403.31 1 1
74222.61 52 1 11 1 348048.09 1 0
21070.43 67 2 16 2 22896.27 0 1
115647.13 64 2 17 3 411999.49 1 1
124255.99 43 3 11 4 235693.47 0 1
24228.85 72 3 5 0 347027.23 1 1
42176.58 63 3 20 2 185901.91 0 0
47377.00 67 2 2 3 455074.87 0 0
51763.11 75 3 8 1 56667.39 0 0
103896.05 46 0 9 3 425032.79 0 0
79496.43 34 0 13 0 363107.54 0 0
83138.10 22 4 13 4 159897.04 0 0
143929.62 22 0 12 4 259574.36 1 1
23235.85 75 4 17 6 375881.49 1 1
92428.27 37 2 5 3 255023.91 0 0
92539.75 80 4 3 7 45027.70 1 1
84296.09 72 2 8 7 131900.47 1 1
129891.41 32 1 2 7 213871.10 1 1
112032.61 77 1 13 4 0.00 1 1
64788.45 79 4 5 1 77918.10 0 0
135639.88 27 4 19 6 19186.18 0 1
105017.64 78 0 8 0 60901.15 0 1
30651.28 70 3 4 6 274973.34 0 0
124914.89 20 3 14 0 422490.72 0 1
51554.60 43 1 6 8 403873.21 0 1
41756.59 30 0 9 8 261237.01 0 1
40777.18 24 3 18 2 389602.49 0 1
22852.18 20 3 19 3 471134.00 0 1
46972.72 58 4 1 3 202696.61 0 1
81360.10 57 2 1 5 464026.01 0 0
118399.67 52 3 2 2 349837.50 0 0
97798.00 51 1 17 0 322701.69 0 0
34013.23 37 4 2 7 35509.87 0 1
39825.38 67 0 8 4 449697.25 0 0
42887.81 68 0 8 5 311664.92 0 0
136958.05 62 3 1 2 471262.04 0 1
73966.94 28 1 4 6 427243.06 0 0
58996.46 43 2 9 4 327786.82 0 1
147256.23 54 0 7 1 494074.74 0 1
137604.47 80 2 17 1 129093.43 0 1
67246.30 70 1 7 4 241223.87 0 0
111248.54 53 1 18 4 464879.92 0 0
138762.62 27 3 18 7 270868.15 0 1
130104.22 57 0 3 0 0.00 0 1
66419.40 70 1 1 6 430196.54 0 0
105905.17 78 1 14 4 221805.03 0 1
66718.69 22 3 3 0 359392.59 0 0
105371.38 28 0 17 3 118819.02 0 1
150000.00 20 3 8 1 52783.80 0 1
78824.05 70 1 13 6 36615.61 0 1
104203.66 20 0 5 4 472282.73 0 1
81031.09 20 3 7 8 281226.60 0 0
35394.28 26 1 18 4 461495.78 0 1
93010.35 44 4 14 8 500000.00 0 0
89977.25 29 2 13 7 119006.38 0 0
71357.39 62 0 18 3 413141.14 0 0
86564.45 33 2 9 5 477016.95 0 0
141029.82 22 2 10 8 166924.91 0 1
104962.64 45 4 10 1 344391.50 0 0
91708.12 50 3 11 8 446718.26 0 0
79017.63 24 1 11 1 182439.18 0 0
126987.44 77 3 10 8 276517.61 0 1
50959.12 69 2 11 4 158345.54 0 0
97361.54 52 2 2 1 472890.20 0 0
57871.07 27 4 18 5 40516.26 0 0
114719.99 35 2 1 6 371147.62 0 1
127173.60 33 2 5 7 278513.53 0 1
22550.37 78 3 20 2 358223.86 0 1
41046.87 46 2 13 3 261612.27 0 1
124898.08 41 4 10 8 425863.76 0 1
126582.41 54 1 8 8 230828.11 0 1
102972.17 51 0 7 2 157549.97 0 0
28104.27 79 0 8 3 212529.63 0 0
91758.29 55 0 5 4 95280.30 0 0
64470.54 72 0 4 5 72924.57 0 0
70008.34 44 4 17 2 369048.20 0 1
43081.32 41 1 6 5 191982.40 0 1
81168.48 34 4 10 3 35099.86 0 0
37764.30 25 1 8 2 284129.43 0 1
150000.00 23 4 18 3 500000.00 0 1
30760.75 78 3 4 1 224080.10 0 0
115765.86 71 4 20 6 394825.63 0 1
50168.51 34 2 18 4 187646.35 0 0
108117.31 44 2 9 5 304615.65 0 0
117407.26 23 1 5 0 302195.48 0 1
47941.26 24 0 3 7 83615.72 0 0
98186.31 76 3 8 1 199271.62 0 1
123561.03 48 2 19 8 76055.51 0 0
85061.25 32 1 12 6 474664.16 0 0
42224.98 60 1 5 1 166459.62 0 0
118559.78 72 3 3 2 287256.55 0 1
120060.69 21 1 17 2 241135.20 0 1
62275.60 49 2 17 8 428596.18 0 1
48687.53 41 2 2 8 201617.39 0 1
124449.27 76 1 1 5 124080.94 0 1
102587.43 20 1 20 6 291874.29 0 1
37922.36 44 2 10 0 411478.22 0 1
20000.00 69 2 14 2 203408.69 0 1
20863.04 53 4 3 0 91408.71 0 1
97753.98 48 1 18 1 228509.30 0 0
76230.21 70 2 18 0 239003.97 0 1
61490.09 69 1 19 7 122582.23 0 0
71638.61 26 1 7 6 173247.38 0 0
119640.78 33 2 20 3 331233.65 0 1
85002.33 62 4 7 8 79004.52 0 1
126499.07 78 4 10 4 477178.64 0 1
78407.74 23 2 15 3 198359.20 0 0
78825.21 41 2 7 4 276219.76 0 0
84333.91 69 3 17 1 227553.51 0 1
141174.42 71 3 3 5 248924.01 0 1
36326.11 57 2 5 4 416774.03 0 1
57481.25 80 2 20 7 87090.64 0 0
76974.38 70 0 17 5 235786.55 0 1
122464.62 35 4 4 7 474694.34 0 1
115800.11 30 4 13 0 462889.38 0 1
43973.42 49 4 9 3 436988.91 0 1
138893.08 51 4 4 8 247379.14 0 1

Actionable Insights

  • Continue product inventory ratio of Sony (62%) to Acer (38%) for new customer clientele
  • Target new customer base with direct-to-consumer email and mailing promotional efforts based on predicted brand preference
  • Deploy algorithm in online platform for upselling and recommender strategies during all client online shopping experiences