forked from smithjd/sql-pet
-
Notifications
You must be signed in to change notification settings - Fork 0
/
085-sales-forecasting.Rmd
140 lines (102 loc) · 4.23 KB
/
085-sales-forecasting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# Sales Forecasting {#chapter_sales-forecasting}
> This chapter demonstrates how to:
>
> * Use [`tidyverts`](https://tidyverts.org/) packages to explore and forecast sales data.
## Setup
The following packages are used in this chapter:
```{r package list, echo=TRUE, message=FALSE, warning=FALSE}
library(tidyverse)
library(DBI)
library(RPostgres)
require(knitr)
library(bookdown)
library(sqlpetr)
library(tsibble)
library(fable)
library(zoo)
library(connections)
sleep_default <- 3
```
Analyzing sales time series, in particular determining seasonality and forecasting future sales, is a common activty in business management. A collection of packages called `tidyverst` is designed to do this in a tidy data framework.
First, we make sure the Docker container is ready and connect to the `adventureworks` database.
```{r connect_to adventureworks}
sqlpetr::sp_docker_start("adventureworks")
Sys.sleep(sleep_default)
```
```{r connect_to adventureworks}
# con <- connection_open( # use in an interactive session
con <- dbConnect( # use in other settings
RPostgres::Postgres(),
# without the previous and next lines, some functions fail with bigint data
# so change int64 to integer
bigint = "integer",
host = "localhost",
user = Sys.getenv("DEFAULT_POSTGRES_USER_NAME"),
password = Sys.getenv("DEFAULT_POSTGRES_PASSWORD"),
dbname = "adventureworks",
port = 5432
)
```
Next, we retrieve the "sales order header" table from the database, close the connection and stop the container.
```{r retrieve_sales_order_header}
dbExecute(con, "set search_path to sales;")
salesorderheader_tibble <- DBI::dbReadTable(con, "salesorderheader")
DBI::dbDisconnect(con)
sqlpetr::sp_docker_stop("adventureworks")
```
## Exploring the sales data
Some assumptions:
1. The business requirement is to analyze / forecast revenue.
2. The revenue figures we care about are those in the `subtotal` column. The shipping and tax numbers are costs, not revenue.
3. The values in column `subtotal` have been converted to the common currency of the Adventureworks headquarters.
In a real-world setting, the analyst would need to validate these assumptions. Given that, our first task is to create a `tsibble` - a time series tibble - of monthly revenue figures.
### Creating a tsibble
```{r create_raw_tsibble}
monthly_tsibble <- salesorderheader_tibble %>%
dplyr::mutate(
origin = ifelse(onlineorderflag, "online", "sales_rep"),
month = lubridate::floor_date(shipdate, unit = "months") %>%
as.yearmon()
) %>%
dplyr::group_by(origin, month) %>%
dplyr::summarize(orders = n(), total_revenue = sum(subtotal)) %>%
dplyr::ungroup()
# Note that there are two more months - June and July of 2014 - for the
# online data, and the revenue values are suspiciously low.
# We remove them for consistency.
monthly_tsibble <- monthly_tsibble %>%
dplyr::filter(month < '2014-06-01') %>%
tsibble::as_tsibble(key = origin, index = month)
```
### Exploring the data
First, let's look at orders for online and sales representative sales:
```{r}
monthly_tsibble %>% autoplot(orders)
```
Disconnect from the database:
```{r }
dbDisconnect(con)
```
## Cleaning up
Always have R disconnect from the database when you're done.
```{r}
dbDisconnect(con)
```
```
Wow! Online orders really took off in the late spring - early summer of 2013! How about revenues?
```{r}
monthly_tsibble %>% autoplot(total_revenue)
```
There's an increase, but the sales representatives always brought in more revenue than the online platform. And there's a pronounced variation in the revenue from sales representatives on a month-to-month basis.
Before moving on, let's look at revenue per order.
```{r}
monthly_tsibble %>% autoplot(total_revenue / orders)
```
For the sales representatives, there's still a month-to-month variation but the revenue per order appears to be bounded both below and above. However, the online revenue per order is decreasing. Note that this decline appears to be in steps between May and June each year; that could mean it's an artifact of the database creation process and not a "natural" phenomenon.
## Disconnect from the database and stop Docker
```{r}
dbDisconnect(con)
# or if using the connections package, use:
# connection_close(con)
sp_docker_stop("adventureworks")
```