First of all. I'm not a mathematician by any means so apologize if something is incorrect, wrong or not proper in the perspective of math. This is just the error analysis from the perspective of developers like me. Because last time I tried to figure out a math problem involving 200 candies I got nothing in my mind other than that guy who bought those candies must having diabetes |
A brief transformation of errors workflow in linear model (scalar & matrices perspective).
Linearity is pretty common model to encounter in the estimation realm. I mean.., it's like being everywhere. During the computation it's pretty common to use the matrix/vector formation but why is it computed in such way? Do we really understand it? Implementing/solving the model using a tool such as Tensorflow tends to be easy but most individuals don't really understand why exactly it's computed in such way.
This is the common model having the coefficients of
and the lost would be described as
but what is needed is the lost for all observed values (
that yields
Now, the gradient needs to be zero (with respect to
I'm not a mathematician by any means but i think it's a valid point to mention the minima of it's function
With the intention of simplication the equation has to shorten. The following are the steps taken
- distribute that
$1/N$ inside the summation
-
And then, left side simplified into
$$\frac{c_{1}^{2}}{N}( \sum_{n=1}^{N} x_{n}^{2})$$ and the right side simplified into
Hence,
partial derivative with respect to
and the same with respect to
The expressions need to be zero to extract the
starting from equation (1.2)
The average of
and average of
so
subtituting (2) into (1.1) then do some arrangement yields
using expression
and gathering together
Setting this partial derivative to
Again, because the topic is about the average (
Denote
and
therefore
The basic idea is the following,
Let's represent the coeffs.
also with the
that
Again, the loss exposure should cover the entire (N) datapoints
and
and therefore the lost could be would be written as (*notice that T (transpose) otherwise it doesn't make sense for the multiplication)
So overall,
Now what is required is the matrix
At this point, the are still several steps involved in differentiating the
Fortunately there are shortcuts can be applied when differentiating a vector. In short,
The final step is to solve the
Finally,