The following equation represents the expected out-of-sample error in terms of which is the 'average function' which can be interpreted as: Generate many data sets and apply the learning algorithm to each data set to produce final hypotheses . We can then estimate the average function for any by . Essentialy, we are viewing as a random variable, with the randomness coming from the randomness in the dataset; is the expected value of this random variable (for a particular ), and is a function, the average function, composed of these expected values.
The term measures how much the average function that we would learn using different data sets deviates from the target function that generated these data sets. This term is called bias.
As it measures how much our learning model is biased away from the target function. This is because has the benefit of learning from an unlimited number of datasets, so it is only limited by its ability to approximate by the limitation in the model learning itself.
The term is the variance of the random variable .
The variance measures the variation in the final htpothesis, depending on the data set. We thus arrive at the bias-variance decomposition of out-of-sample error.
Considering the target function and a datset of size . We sample uniformly in [-1, 1] to generate a data set , .
Fit the model using:
: Set of all lines of the form
For , we choose the constant hypothesis that best fits the data (the horizontal line at the midpoint, ).
Consider a target function and a data set of size . We sample uniformly in [-1, 1] to generate a data set , .
Fit the model using:
: Set of all lines of the form
With , the learned hypothesis is wilder and varies extensively depending on the dataset.
- Abu-Mostafa, Y. S., Magdon-Ismail, M., & Lin, H. T. (2012). Learning from data (Vol. 4). New York, NY, USA:: AMLBook.