Supervised Learning -b.Rmd

---
title: "Supervised learning Part2"
author: "Yatish"
date: "October 14, 2015"
output: html_document
---

Load the required libraries:
```{r}
require("ggplot2");
require("C50");
require("gmodels");
require("rpart");
require("RColorBrewer");
require("tree");
require("party");
```


Reading cars data:
```{r}
car_data<- read.table("car.data",sep=",")
colnames(car_data)<-c("buying","maint","doors","persons","lug_boot","safety","class")
table(car_data$class)
str(car_data)
```

randomizing the data to create subsets of trained set and test set:
```{r}
set.seed(12345)
car_rand <- car_data[order(runif(1728)), ]

car_train <- car_rand[1:1600,]
car_test <- car_rand[1600:1728,]

prop.table(table(car_train$class))

prop.table(table(car_test$class))
```


Making a cross table(not asked in the question but good to have this data just to see additional information)
```{r}
model<- C5.0(car_train[-7], car_train$class)

model

summary(model)


car_type_pred <- predict(model, car_test)

CrossTable(car_test$class,car_type_pred,prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE,dnn = c('actual class', 'predicted class'))

```

data for classification tree and regression tree:
```{r}
formula<-class  ~ buying+maint+doors+persons+lug_boot+safety

fit = rpart(formula, method="class", data=car_train)
printcp(fit)

plotcp(fit) 

summary(fit)


fitr <- rpart(formula, method="anova", data=car_train)

printcp(fitr)
plotcp(fitr)
summary(fitr)
```

We are using the summary function to see the set of rules generated by the algorithm.

Classification tree plot and regression plot:
```{r}
plot(fitr, uniform=TRUE, 
     main="Regression Tree for cars 'class' ")
text(fit, use.n=TRUE, all=TRUE, cex=.8)

plot(fit, uniform=T, main="Classification Tree for cars Classs")
text(fit, use.n=TRUE, all=TRUE, cex=.8)
```


Creating another classification tree using the tree package:
```{r}
tr = tree(formula, data=car_train)
summary(tr)
plot(tr)
text(tr)
```


## Answers:

### 1:
Yes, the size of the data set makes a difference for two reasons:

1. Higher the size of the data more higher we can keep the size of training set which improves the algorithm.
2. Higher the size of the data, the algorithm can come up with more logical rules, better testing


### 2:
Yes, Rules make sense as the rules were are helping to classify the data under different branches based on the Probability of node and expected loss.
Algorithm generated these set of good rules to help decide which instance goes in which branch.


### 3:
In the above example data consist has attributes type as categorical thus scaling and normalizing shouldn't make a difference. Scaling and normalization normally helps when the data is numerical and these techniques can cover the dataset between edge cases. In our case data has absolute values i.e. fixed values like safety attribute can only have 3 values i.e. high,low,medium.