You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all. I'm not a mathematician by any means so apologize if something is incorrect, wrong or not proper in the perspective of math. This is just the error analysis from the perspective of developers like me.
Because last time I tried to figure out a math problem involving 200 candies I got nothing in my mind other than that guy who bought those candies must having diabetes
A brief transformation of errors workflow in linear model (scalar & matrices perspective).
Linearity is pretty common model to encounter in the estimation realm. I mean.., it's like being everywhere. During the computation it's pretty common to use the matrix/vector formation but why is it computed in such way? Do we really understand it? Implementing/solving the model using a tool such as Tensorflow tends to be easy but most individuals don't really understand why exactly it's computed in such way.
This is the common model having the coefficients of $c_{1}$ for the gradient & $c_{0}$intercept
but what is needed is the lost for all observed values ($N$ data points), hence (after adjustment to the $c_{0}$ & $c_{1}$) . We come the conclusion that we need is
Now, the gradient needs to be zero (with respect to $c$) by deriving it (the goal is the minima isn't it?? ).
I'm not a mathematician by any means but i think it's a valid point to mention the minima of it's function
$$\frac{1}{N}\sum_{n=1}^{N}[c_{1}^{2}x_{n}^{2}+2c_{1}x_{n}c_{0}-2c_{1}x_{n}t_{n}]$$
With the intention of simplication the equation has to shorten. The following are the steps taken
Now what is required is the matrix $c$ corresponding to the minima (minimum) of that $L_{n}$.This can be achieved with the partial derivative of $L_{n}$ with respect to the matrix $c$
At this point, the are still several steps involved in differentiating the $L_{n}$ but I think it's going to be another topic (or rather self explanatory if you're that math guy) as that would be too much details.
Fortunately there are shortcuts can be applied when differentiating a vector. In short,
The final step is to solve the $c$ (but it can't be done directly as no division in matrix). So the idea is using an identity matrix to eliminate that $x^Tx$ because the property of the identity tells that $Ic = c$ and therefore $Ic = (x^Tx)^{-1}x^Tt$.