$\newcommand\array[1]{\begin{bmatrix}#1\end{bmatrix}}$ [MITx 6.86x Notes Index]
Norm of a vector
Norm: Answer the question how big is a vector
- $|X|l \equiv l-NORM := (\sum{i=1}^{size(X)} x_i^l)^{(1/l)}$
- Julia:
norm(x)
- NumPy:
numpy.linalg.norm(x)
If l is not specified, it is assumed to be 2, i.e. the Euclidian distance. It is also known as "length", "2-norm", "l2 norm", ...
Dot product of vector
Aka "scalar product" or "inner product".
It has a relationship on how vectors are arranged relative to each other
- Algebraic definition:
$x \cdot y \equiv x' y := \sum_{i=1}^n x_i * y_i$ - Geometric definition:
$x \cdot y := |x| * |y| * cos(\theta)$ (where$\theta$ is the angle between the two vectors and$|x|$ is the 2-norm) - Julia:
dot(x,y)
- Numpy:
np.dot(x,y)
Note that using the two definitions and the arccos
, the inverse function for the cosine, you can retrieve the angle between two functions as angle_x_y = arccos(dot(x,y)/(norm(x)*norm(y)))
.
- Julia:
angle_x_y = acos(dot(x,y)/(norm(x)*norm(y)))
Geometric interpretation of a vector
Geometrically, the elements of a vector can be seen as the coordinates of the position of arrival compared to the position of departing. They represent hence the shift from the departing point.
For example the vector [-2,0] could refer to the vector from the point (4,2) to the point (2,2) but could also represent the vector going from (6,4) to (4,4).
Vectors whose starting point is the origin are called position vectors and they define the coordinates in the n-space of the points where they arrive to.
Let's be a and b two (not necessary unit) vectors. We want to compute the vector c being the projection of a on b and its l-2 norm (or length):
Let's start from the length. We know from a well-known trigonometric equation that
But we also know that the dot product
By substitution we find that
To find the vector c we now simply multiply
If b is already a unit vector, the above equations reduce to:
In Julia:
using LinearAlgebra
a = [4,1]
b = [2,3]
normC = dot(a,b)/norm(b)
c = (dot(a,b)/norm(b)^2) * b
In Python:
import numpy as np
a = np.array([4,1])
b = np.array([2,3])
normC = np.dot(a,b)/np.linalg.norm(b)
c = (np.dot(a,b)/np.linalg.norm(b)**2) * b
An (hyper)plane in n dimensions is any n−1 dimensional subspace defined by a linear relation. For example, in 3 dimensions, hyperplanes span 2 dimensions (and they are just called "planes") and can be defined by the vector formed by the coefficients {A,B,C,D} in the equation
As hyperplanes separate the space into two sides, we can use (hyper)planes to set boundaries in classification problems, i.e. to discriminate all points on one side of the plane vs all the point on the other side.
Besides to this analytical definition, a plane can be uniquely identified also in a geometrical way starting from a point
- Normal of a plane: any n-dimensional vector perpendicular to the plane.
- Offset of the plane with the origin: the distance of the plan with the origin, that is the specific normal between the origin and the plane
Given a point
To sum up, we can define the plane as the set of all points x
As from the coefficients A-D in the equation, while
For example, let's define a plane in two dimensions passing by the point
Let's consider instead the point b = (1,4). As
Starting from the information on the point and the normal we can retrieve the algebraic equation of the plane rewriting the equation
Note that when
Using the above example (
Distance of a point to a plane
Given a plan defined by its norm
But we know also that
By substitution, we find that
By expanding
The distance is positive when
For example the distance between the point
Projection of a point on a plane
We can easily find the projection of a point on a plane by summing to the positional vector of the point, the vector of the distance from the point to the plan, in turn obtained multiplying the distance (as found earlier) by the negative of the unit vector of the normal to the plane.
Algebraically:
For the example before, the point
On this subject see also:
- https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/dot-cross-products/v/defining-a-plane-in-r3-with-a-point-and-normal-vector
- https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/dot-cross-products/v/point-distance-to-plane
Loss Function
This argument in details: segments 3.4 (binary linear classification), 5.4 (Linear regression)
The loss function, aka the cost function or the race function, is some way for us to value how far is our model from the data that we have.
We first define an "error" or "Loss". For example in Linear Regression the "error" is the Euclidean distance between the predicted and the observed value:
The objective is to minimise the loss function by changing the parameter theta. How?
Gradient Descent
This argument in details: segments 4.4 (gradient descent in binary linear classification), 4.5 (stochastic gradient descent) and 5.5 (SGD in linear regression)
The most common iterative algorithm to find the minimum of a function is the gradient descent.
We compute the loss function with a set of initial parameter(s), we compute the gradient of the function (the derivative concerning the various parameters), and we move our parameter(s) of a small delta against the direction of the gradient at each step:
The
- too small learning rate: we may converge very slowly or end up trapped in small local minima;
- too high learning rate: we may diverge instead of converge to the minimum
Chain rule
How to compute the gradient for complex functions.
e.g.
For computing derivatives one can use SymPy, a library for symbolic computation. In this case the derivative can be computed with the following script:
from sympy import *
x, p1, p2 = symbols('x p1 p2')
y = 1/(1+exp( - (p1*x + p2)))
dy_dp1 = diff(y,p1)
print(dy_dp1)
It may be useful to recognise the chain rule as an application of the chain map, that is tracking all the effect from one variable to the other:
Geometric progressions are sequence of numbers where each term after the first is found by multiplying the previous one by a fixed, non-zero number called the common ratio.
It results that geometric series can be wrote as
Geometric series are the sum of the values in a geometric progression.
Closed formula exist for both finite geometric series and, provided
-
$\sum_{k=m}^n ar^k = \frac{a(r^m-r^{n+1}}{1-r}$ with$r \neq 1$ and$m < n$ -
$\sum_{k=m}^∞ ar^k = \frac{ar^m}{1-r}$ with$|r| < 1$ and$m < n$ .
Where m is the first element of the series that you want to consider for the summation. Typically
For many more details on geometric progressions and series consult the relative excellent Wikipedia entry.