-
Notifications
You must be signed in to change notification settings - Fork 0
/
linearRegression.Rmd
105 lines (80 loc) · 3.36 KB
/
linearRegression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: "Linear Regression"
output: html_notebook
---
This is a Linear Regression analysis using Concrete Slump Data.
The UCI Machine Learning Repository has this dataset containing diferent variables (Cement, Slag, Fly ash, Water, SP, Coarse Aggr., Fine Aggr., SLUMP, FLOW, 28-day Compressive Strength).
The dataset is maintained on their site, where it can be found by the title "Concrete Slump Test Data Set".
<b>Citation:</b>
Yeh, I-Cheng, "Modeling slump flow of concrete using second-order regressions and artificial neural networks," Cement and Concrete Composites, Vol.29, No. 6, 474-480, 2007.
```{r}
# Data Preprocessing
# Importing the dataset
#install.packages("tidyverse")
#library(readxl)
dataset = read_excel("Concrete_Data.xls")
dataset
```
```{r}
#Choose data for simple linear regresion
#install.packages("tidyverse")
#library(tidyverse)
dataset_simpleLR <- dataset %>% select(1,9)
dataset_simpleLR
```
```{r}
# Splitting the dataset into the Training set and Test set
#install.packages('caTools')
#library(caTools)
split = sample.split(dataset_simpleLR$`Concrete compressive strength(MPa, megapascals)`, SplitRatio = 0.8)
training_set = subset(dataset_simpleLR, split == TRUE)
test_set = subset(dataset_simpleLR, split == FALSE)
```
```{r}
print(training_set)
```
```{r}
print(test_set)
```
```{r}
#fitting simple linear regression to the training set
regressor = lm(formula = `Concrete compressive strength(MPa, megapascals)` ~ `Cement (component 1)(kg in a m^3 mixture)`, data = training_set)
summary(regressor)
```
```{r}
#Predicting test results
y_pred = predict(regressor, newdata = test_set)
```
```{r}
#Visualizing
#library(ggplot2)
#Visualizing the Training set results
#dots
a <- ggplot()
b <- a + geom_point(aes(x = training_set$`Cement (component 1)(kg in a m^3 mixture)`, y = training_set$`Concrete compressive strength(MPa, megapascals)`),colour = 'red')
#linear regresion line
c <- b + geom_line(aes(x = training_set$`Cement (component 1)(kg in a m^3 mixture)`, y = predict(regressor, newdata = training_set)), colour = 'blue')
#labels
d <- c + ggtitle('Concrete compressive strength (Mpa) vs. Cement (kg/m^3)') + xlab('Cement (kg/m^3)') + ylab('Concrete compressive strength (Mpa)')
d
```
```{r}
#Test Set
a1 <- ggplot()
b1 <- a1 + geom_point(aes(x = test_set$`Cement (component 1)(kg in a m^3 mixture)`, y = test_set$`Concrete compressive strength(MPa, megapascals)`),colour = 'red')
#linear regresion line
c1 <- b1 + geom_line(aes(x = training_set$`Cement (component 1)(kg in a m^3 mixture)`, y = predict(regressor, newdata = training_set)), colour = 'blue')
#labels
d1 <- c1 + ggtitle('Concrete compressive strength (Mpa) vs. Cement (kg/m^3)') + xlab('Cement (kg/m^3)') + ylab('Concrete compressive strength (Mpa)')
d1
```
```{r}
#Test Set Points
a2 <- ggplot()
b2 <- a2 + geom_point(aes(x = test_set$`Cement (component 1)(kg in a m^3 mixture)`, y = test_set$`Concrete compressive strength(MPa, megapascals)`),colour = 'red')
#linear regresion line
c2 <- b2 + geom_line(aes(x = training_set$`Cement (component 1)(kg in a m^3 mixture)`, y = predict(regressor, newdata = training_set)), colour = 'blue') + geom_point(aes(x = test_set$`Cement (component 1)(kg in a m^3 mixture)`, y = y_pred),colour = 'black')
#labels
d2 <- c2 + ggtitle('Concrete compressive strength (Mpa) vs. Cement (kg/m^3)') + xlab('Cement (kg/m^3)') + ylab('Concrete compressive strength (Mpa)')
d2
```