The dataset contains transactions made by customers and each transaction hold records of item(s) sold and its quantity sold.
Parameter | Value |
---|---|
Support | 0.015 |
Confidence | 0.9 |
Algorithm | apriori |
Time required | 0.20s - 0.24s |
A summary of the rules (Pruned)
Description | Value |
---|---|
minimum support | 0.018 |
maximum support | 0.040 |
minimum confidence | 0.9 |
maximum confidence | 1.0 |
minimum lift | 11.18 |
maximum lift | 19.61 |
A selection of those we would show to the client are rules with high support, confidence and lift value.
Clients can do bundled promotions based on the rules discovered.
The rules has shown that those who like coffee flavor will also favor blackberry flavor. Hence we can conclude that customers enjoy the combination of these flavors as their meals. The recommendation that we can give to the client is, try to make a bundle based on the combination of flavor of the menu. Besides that, those who buy vanilla frappucino and walnut cookie are likely to buy chocolate tart. Hence the client can sell these in a bundle. Clients can also do discounts and promotion on items that are frequently bought together. For instance, Those who buy coffee drink can get discounted price for eclair,pie or twist.
library(arules)
library(arulesViz)
library(ggplot2)
Load the dataset and assigning header names to each column
receipt_df <- read.csv("1000i.csv", header = F)
names(receipt_df) <- c("Receipt_Number","Quantity","Food")
Before preprocessing
## Receipt_Number Quantity Food
## 1 1 3 7
## 2 1 4 15
## 3 1 2 49
## 4 1 5 44
## 5 2 1 1
## 6 2 2 19
Create a dataframe containing each item and its corresponding item_ID
id <- c(0:49)
food <- c("Chocolate Cake","Lemon Cake","Casino Cake","Opera Cake", "Strawberry Cake", "Truffle Cake", "Chocolate Eclair", "Coffee Eclair", "Vanilla Eclair", "Napolean Cake", "Almond Tart", "Apple Pie", "Apple Tart","Apricot Tart", "Berry Tart", "Blackberry Tart", "Blueberry Tart", "Chocolate Tart", "Cherry Tart", "Lemon Tart", "Pecan Tart", "Ganache Cookie", "Gongolais Cookie", "Raspberry Cookie", "Lemon Cookie", "Chocolate Meringue", "Vanilla Meringue", "Marzipan Cookie", "Tuile Cookie", "Walnut Cookie", "Almond Croissant", "Apple Croissant", "Apricot Croissant", "Cheese Croissant", "Chocolate Croissant", "Apricot Danish", "Apple Danish", "Almond Twist", "Almond Bear_Claw", "Blueberry Danish", "Lemon Lemonade", "Raspberry Lemonade", "Orange Juice", "Green Tea", "Bottled Water", "Hot Coffee", "Chocolate Coffee", "Vanilla Frappucino", "Cherry Soda", "Single Espresso")
df <- data.frame(id, food)
Map item_ID to its text representation
receipt_df$Food <- df$food[match(receipt_df$Food,df$id)]
Seperating food into “Flavor” and “Type” representation
ft <- matrix(unlist(strsplit(as.character(receipt_df$Food), ' ')) , ncol=2, byrow=TRUE)
receipt_df <- data.frame(receipt_df, ft)
names(receipt_df) <- c("Receipt_Number","Quantity","Food", "Flavor", "Type")
After preprocessing
head(receipt_df)
## Receipt_Number Quantity Food Flavor Type
## 1 1 3 Coffee Eclair Coffee Eclair
## 2 1 4 Blackberry Tart Blackberry Tart
## 3 1 2 Single Espresso Single Espresso
## 4 1 5 Bottled Water Bottled Water
## 5 2 1 Lemon Cake Lemon Cake
## 6 2 2 Lemon Tart Lemon Tart
Convert into basket format to run in apriori
test_df <- receipt_df[,c("Receipt_Number","Food", "Flavor", "Type")]
df_trans <- as(split(test_df$Food, test_df$Receipt_Number), "transactions")
df_trans_Flavor <- as(split(test_df$Flavor, test_df$Receipt_Number), "transactions")
df_trans_Type <- as(split(test_df$Type, test_df$Receipt_Number), "transactions")
#start timer
ptm <- proc.time() #Calculate running time
rules<-apriori(df_trans,
control=list(verbose=F),
parameter=list(supp=0.015,conf=0.9))
#trying to remove redundancy
subset.matrix <- is.subset(rules, rules)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
#remove redundant rules
rules.pruned <- rules[!redundant]
rules <- rules.pruned
#end timer
proc.time() - ptm
## user system elapsed
## 0.06 0.00 0.06
## rules support
## 24 {Almond Twist,Hot Coffee} => {Apple Pie} 0.024
## 5 {Green Tea,Lemon Lemonade} => {Lemon Cookie} 0.019
## 25 {Coffee Eclair,Hot Coffee} => {Apple Pie} 0.024
## 1 {Chocolate Tart,Walnut Cookie} => {Vanilla Frappucino} 0.018
## 18 {Green Tea,Lemon Cookie} => {Raspberry Lemonade} 0.019
## 21 {Green Tea,Raspberry Cookie} => {Raspberry Lemonade} 0.019
## 7 {Green Tea,Lemon Lemonade} => {Raspberry Lemonade} 0.019
## 10 {Lemon Cookie,Lemon Lemonade} => {Raspberry Lemonade} 0.028
## 30 {Apricot Croissant,Hot Coffee} => {Blueberry Tart} 0.032
## 36 {Apple Danish,Cherry Soda} => {Apple Tart} 0.031
## 37 {Apple Croissant,Cherry Soda} => {Apple Tart} 0.031
## 32 {Lemon Cookie,Raspberry Lemonade} => {Raspberry Cookie} 0.029
## 15 {Lemon Lemonade,Raspberry Lemonade} => {Raspberry Cookie} 0.028
## 34 {Apricot Danish,Opera Cake} => {Cherry Tart} 0.038
## 19 {Green Tea,Lemon Cookie} => {Raspberry Cookie} 0.019
## 28 {Casino Cake,Chocolate Cake} => {Chocolate Coffee} 0.038
## 8 {Green Tea,Lemon Lemonade} => {Raspberry Cookie} 0.019
## 13 {Lemon Cookie,Lemon Lemonade} => {Raspberry Cookie} 0.028
## 38 {Apple Danish,Apple Tart} => {Apple Croissant} 0.040
## 41 {Apple Danish,Cherry Soda} => {Apple Croissant} 0.031
## 26 {Almond Twist,Hot Coffee} => {Coffee Eclair} 0.024
## 3 {Blackberry Tart,Single Espresso} => {Coffee Eclair} 0.023
## 22 {Almond Twist,Apple Pie} => {Coffee Eclair} 0.027
## confidence lift
## 24 0.9600000 14.11765
## 5 0.9047619 13.70851
## 25 0.9230769 13.57466
## 1 1.0000000 13.51351
## 18 0.9500000 13.19444
## 21 0.9500000 13.19444
## 7 0.9047619 12.56614
## 10 0.9032258 12.54480
## 30 1.0000000 12.34568
## 36 0.9393939 11.89106
## 37 0.9393939 11.89106
## 32 0.9666667 11.78862
## 15 0.9655172 11.77460
## 34 0.9743590 11.59951
## 19 0.9500000 11.58537
## 28 0.9500000 11.17647
## 8 0.9047619 11.03368
## 13 0.9032258 11.01495
## 38 0.9756098 10.72099
## 41 0.9393939 10.32301
## 26 0.9600000 10.32258
## 3 0.9583333 10.30466
## 22 0.9310345 10.01112
#start timer
ptm <- proc.time()
rules2<-apriori(df_trans_Flavor,
control=list(verbose=F),
parameter=list(supp=0.005,conf=0.7))
#trying to apply remove redundancy
subset.matrix <- is.subset(rules2, rules2)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
#remove redundant rules
rules2.pruned <- rules2[!redundant]
rules2 <- rules2.pruned
#end timer
proc.time() - ptm
## user system elapsed
## 0.04 0.00 0.05
## rules support confidence lift
## 1 {Blackberry,Single} => {Coffee} 0.023 0.9583333 10.304659
## 10 {Cheese,Strawberry} => {Napolean} 0.005 0.8333333 9.259259
## 11 {Marzipan,Vanilla} => {Tuile} 0.008 0.8888889 8.714597
## 21 {Apricot,Chocolate,Marzipan} => {Tuile} 0.005 0.8333333 8.169935
## 12 {Coffee,Hot} => {Almond} 0.025 0.9615385 5.623032
## 8 {Cherry,Opera} => {Apricot} 0.038 0.8636364 4.406308
## 17 {Blueberry,Hot} => {Apricot} 0.034 0.8500000 4.336735
## 14 {Coffee,Hot} => {Apple} 0.024 0.9230769 3.978780
## 4 {Green,Raspberry} => {Lemon} 0.020 0.8333333 3.858025
## 5 {Vanilla,Walnut} => {Chocolate} 0.023 0.8846154 3.442083
## 15 {Almond,Coffee} => {Apple} 0.031 0.7948718 3.426172
## 19 {Almond,Hot} => {Apple} 0.026 0.7878788 3.396029
## 28 {Apple,Cherry,Vanilla} => {Chocolate} 0.008 0.8000000 3.112840
## 27 {Apple,Blueberry,Cherry} => {Chocolate} 0.006 0.7500000 2.918288
## 7 {Casino,Gongolais} => {Chocolate} 0.005 0.7142857 2.779322
#start timer
ptm <- proc.time()
rules3<-apriori(df_trans_Type,
control=list(verbose=F),
parameter=list(supp=0.010,conf=0.8))
#trying to apply remove redundancy
subset.matrix <- is.subset(rules3, rules3)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
#remove redundant rules
rules3.pruned <- rules3[!redundant]
rules3 <- rules3.pruned
#end timer
proc.time() - ptm
## user system elapsed
## 0.03 0.00 0.03
## rules support confidence lift
## 3 {Pie,Twist} => {Eclair} 0.028 0.9655172 6.034483
## 9 {Coffee,Twist} => {Eclair} 0.024 0.8888889 5.555556
## 10 {Coffee,Pie} => {Eclair} 0.024 0.8000000 5.000000
## 6 {Pie,Twist} => {Coffee} 0.024 0.8275862 4.675628
## 11 {Danish,Soda} => {Croissant} 0.034 0.8717949 2.830503
## 2 {Lemonade,Tea} => {Cookie} 0.022 0.8800000 2.162162
## 1 {Eclair,Espresso} => {Tart} 0.026 0.9285714 1.673102
## 12 {Danish,Soda} => {Tart} 0.035 0.8974359 1.617002
## 13 {Croissant,Soda} => {Tart} 0.036 0.8181818 1.474201
Some of the plots from Code.r, refer Shiny for interactivity.
reorder_size <- function(x) {
factor(x, levels = names(sort(table(x))))
}
ggplot(data = receipt_df, aes(x = reorder_size(Food), fill = as.factor(Quantity))) + geom_bar(colour = "black") + coord_flip()
ggplot(data = receipt_df, aes(x = reorder_size(Food), fill = as.factor(Quantity))) + geom_bar(colour = "black") + facet_grid(as.factor(Quantity)~.) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
#trying plot to see what kind of results that we might expected
plot(rules, measure=c("support","lift"), shading="confidence")
#flavor rule, trying to see what can we get from rules 2
plot(rules2, measure=c("support","lift"), shading="confidence")