Filtering Multiple Random Variables
#format the book
%matplotlib inline
from __future__ import division, print_function
from book_format import load_style
load_style()
We are now ready to study and implement the full, multivariate form of the Kalman filter. In the last chapter we learned how multivariate Gaussians express the correlation between multiple random variables, such as the position and velocity of an aircraft. We also learned how correlation between variables drastically improves the posterior. If we only roughly know position and velocity, but they are correlated, then our new estimate can be very accurate.
I prefer that you develop an intuition for how these filters work through several worked examples. I'm going to gloss over many issues. Some things I show you will only work for special cases, others will be 'magical' - it will not be clear how I derived a certain result. If I started with rigorous, generalized equations you would be left scratching your head about what all these terms mean and how you might apply them to your problem. In later chapters I will provide a more rigorous mathematical foundation, and at that time I will have to either correct approximations that I made in this chapter or provide additional information that I did not cover here.
To make this possible we will restrict ourselves to a subset of problems which we can describe with Newton's equations of motion. These filters are called discretized continuous-time kinematic filters. In the Kalman Filter Math chapter we will develop the math for non-Newtonian systems.
Newton's equations of motion tells us that given a constant velocity
For example, if we start at position 13, our velocity is 10 m/s, and we travel for 12 seconds our final position is 133 (
We can incorporate constant acceleration with this equation
And if we assume constant jerk we get
These equations were generated by integrating a differential equation. Given a constant velocity v we can compute the distance traveled over time with the equation
When you design a Kalman filter you start with a system of differential equations that describe the dynamics of the system. Most systems of differential equations do not easily integrate in this way. We start with Newton's equation because we can integrate and get a closed form solution, which makes the Kalman filter easier to design. An added benefit is that Newton's equations are the right equations to use to track moving objects, one of the main uses of Kalman filters.
The algorithm is the same Bayesian filter algorithm that we have used in every chapter. The update step is slightly more complicated, but I will explain why when we get to it.
Initialization
1. Initialize the state of the filter
2. Initialize our belief in the state
Predict
1. Use process model to predict state at the next time step
2. Adjust belief to account for the uncertainty in prediction
Update
1. Get a measurement and associated belief about its accuracy
2. Compute residual between estimated state and measurement
3. Compute scaling factor based on whether the measurement
or prediction is more accurate
4. set state between the prediction and measurement based
on scaling factor
5. update belief in the state based on how certain we are
in the measurement
As a reminder, here is a graphical depiction of the algorithm:
import code.book_plots as book_plots
book_plots.show_residual_chart()
The univariate Kalman filter represented the state with a univariate Gaussian. Naturally the multivariate Kalman filter will use a multivariate Gaussian for the state. We learned in the last chapter that multivariate Gaussians use a vector for the mean and a matrix for the covariances. That means that the Kalman filter needs to use linear algebra to perform the estimations.
I don't want you to memorize these equations, but I have listed the univariate and multivariate equations below. They are quite similar.
Predict
$\begin{array}{|l|l|l|} \hline \text{Univariate} & \text{Univariate} & \text{Multivariate}\ & \text{(Kalman form)} & \ \hline \bar \mu = \mu + \mu_{f_x} & \bar x = x + dx & \bar{\mathbf x} = \mathbf{Fx} + \mathbf{Bu}\ \bar\sigma^2 = \sigma_x^2 + \sigma_{f_x}^2 & \bar P = P + Q & \bar{\mathbf P} = \mathbf{FPF}^\mathsf T + \mathbf Q \ \hline \end{array}$
Without worrying about the specifics of the linear algebra, we can see that:
Update
$\begin{array}{|l|l|l|} \hline \text{Univariate} & \text{Univariate} & \text{Multivariate}\ & \text{(Kalman form)} & \ \hline & y = z - \bar x & \mathbf y = \mathbf z - \mathbf{H\bar x} \ & K = \frac{\bar P}{\bar P+R}& \mathbf K = \mathbf{\bar{P}H}^\mathsf T (\mathbf{H\bar{P}H}^\mathsf T + \mathbf R)^{-1} \ \mu=\frac{\bar\sigma^2, \mu_z + \sigma_z^2 , \bar\mu} {\bar\sigma^2 + \sigma_z^2} & x = \bar x + Ky & \mathbf x = \bar{\mathbf x} + \mathbf{Ky} \ \sigma^2 = \frac{\sigma_1^2\sigma_2^2}{\sigma_1^2+\sigma_2^2} & P = (1-K)\bar P & \mathbf P = (\mathbf I - \mathbf{KH})\mathbf{\bar{P}} \ \hline \end{array}$
The details will be different than the univariate filter because these are vectors and matrices, but the concepts are exactly the same:
- Use a Gaussian to represent our estimate of the state and error
- Use a Gaussian to represent the measurement and its error
- Use a Gaussian to represent the process model
- Use the process model to predict the next state (the prior)
- Form an estimate part way between the measurement and the prior
Your job as a designer will be to design the state
Let's go back to our tried and true problem of tracking a dog. This time we will include the fundamental insight of the previous chapter and use hidden variables to improve our estimates. I could start with the math, but instead let's implement a filter, learning as we go. On the surface the math is different and perhaps more complicated than the previous chapters, but the ideas are all the same - we are just multiplying and adding Gaussians.
We start by writing a simulation for the dog. The simulation will run for count
steps, moving the dog forward approximately 1 meter for each step. At each step the velocity will vary according to the process variance process_var
. After updating the position we compute a measurement with an assumed sensor variance of z_var
. The function returns an NumPy array of the positions and another of the measurements.
import math
import numpy as np
from numpy.random import randn
def compute_dog_data(z_var, process_var, count=1, dt=1.):
"returns track, measurements 1D ndarrays"
x, vel = 0., 1.
z_std = math.sqrt(z_var)
p_std = math.sqrt(process_var)
xs, zs = [], []
for _ in range(count):
v = vel + (randn() * p_std * dt)
x += v*dt
xs.append(x)
zs.append(x + randn() * z_std)
return np.array(xs), np.array(zs)
I have programmed the equations of the Kalman filter into the predict
and update
functions in FilterPy. You will import them with:
from filterpy.kalman import predict, update
For the prediction we need to design the state and covariance, the process model and the process noise, and optionally the control input. We'll take them in order.
We previously tracked a dog in one dimension by using a Gaussian. The mean
In this problem we will be tracking both the position and velocity of the dog. This requires us to use a multivariate Gaussian represented with the state vector
State variables can either be observed variables - directly measured by a sensor, or hidden variables - inferred from the observed variables. For our dog tracking problem the sensor only reads position, so position is observed and velocity is hidden. We will learn how to track hidden variables soon.
It is important to understand that tracking position and velocity is a design choice with implications and assumptions that we are not yet prepared to explore. For example, we could also track acceleration, or even jerk. For now, recall that in the last chapter we showed that including velocity in the covariance matrix resulted in much smaller variances in position. We will learn how the Kalman filter computes estimates for hidden variables later in this chapter.
In the univariate chapter we represented the dog's position with a scalar value (e.g.
The Kalman filter is implemented using linear algebra. We use an
We use
Another way to write this is $\mathbf x =\begin{bmatrix}x & \dot x\end{bmatrix}^\mathsf T$ because the transpose of a row vector is a column vector. This notation is easier to use in text because it takes less vertical space.
Let's code this. Initialization of x
is as simple as
x = np.array([[10.0],
[4.5]])
x
array([[ 10.0],
[ 4.5]])
I often use the transpose in my code to turn a row matrix into a column vector, as I find it easier to type and read:
x = np.array([[10., 4.5]]).T
x
array([[ 10.0],
[ 4.5]])
However, NumPy recognizes 1D arrays as vectors, so I can simplify this line to use a 1D array.
x = np.array([10.0, 4.5])
x
array([ 10.0, 4.5])
All of the array elements have the same type, typically either float
or int
. If the list contains all int
s then the created array will also have a data type of int
, otherwise it will be float
. I will often take advantage of this by only specifying one number as a floating point:
np.array([1., 0, 0, 0, 0, 0])
array([ 1.0, 0.0, 0.0, 0.0, 0.0, 0.0])
Here are some examples.
A = np.array([[1, 2], [3, 4]])
x = np.array([[10.0], [4.5]])
# matrix multiply
print(np.dot(A, x))
print()
x = np.array([[10.0, 4.5]]).T
print(np.dot(A, x))
print()
x = np.array([10.0, 4.5])
print(np.dot(A, x))
[[ 19.0]
[ 48.0]]
[[ 19.0]
[ 48.0]]
[ 19.0 48.0]
The last returns a 1D array, but I have written the Kalman filter class to be able to handle this. In retrospect that might lead to confusion, but it does work.
The other half of the state Gaussian is the covariance matrix
We need to set the variances to reasonable values. For example, we may choose
In the last chapter we showed that the position and velocities are correlated. But how correlated are they for a dog? I have no idea. As we will see the filter computes this for us, so I initialize the covariances to zero. Of course, if you know the covariances you should use them.
Recall that the diagonals of the covariance matrix contains the variance of each variable, and the off-diagonal elements contains the covariances. Thus we have:
We can use numpy.diag
, which creates a diagonal matrix from the values for the diagonal. Recall from linear algebra that a diagonal matrix is one with zeros in the off-diagonal elements.
P = np.diag([500, 49])
%precision 3
P
array([[500, 0],
[ 0, 49]])
I could have written:
P = np.array([[500., 0.],
[0., 49.]])
P
array([[ 500., 0.],
[ 0., 49.]])
We are done. We've expressed the state of the filter as a multivariate Gaussian and implemented it in code.
The next step is designing the process model. It is a mathematical model which describes the behavior of the system. The filter uses it to predict the state after a discrete time step. We do this with a set of equations that describe the dynamics of the system.
In the univariate chapter we modeled the dog's motion with
We implemented this as follows, where pos
and movement
are Gaussian tuples:
def predict(pos, movement):
return (pos[0] + movement[0], pos[1] + movement[1])
We will do the same thing in this chapter, using multivariate Gaussians instead of univariate Gaussians. You might imagine this sort of implementation:
But we need to generalize this. The Kalman filter equations work with any linear system, not just Newtonian ones. Maybe the system you are filtering is the plumbing system in a chemical plant, and the flow in a given pipe is determined by a linear combination of the settings of different valves.
Linear algebra has a powerful way to express systems of equations. Take this system
We can put this in matrix form by writing:
If you perform the matrix multiplication in this equation the result will be the two equations above. In linear algebra we would write this as
And then we can use the SciPy's linalg
package to solve for
from scipy.linalg import solve
A = np.array([[2, 3],[4, -1]])
b = np.array([[8], [2]])
x = solve(A, b)
x
array([[ 1.],
[ 2.]])
We use the process model to perform the innovation, because the equations tell us what the next state will be given the current state. Kalman filters implement this using this linear equation, where
Our job as Kalman filters designers is to specify
What is our equation for velocity? We have no predictive model for how our dog's velocity will change over time. In this case we assume that it remains constant between innovations. Of course this is not exactly true, but so long as the velocity doesn't change too much over each innovation you will see that the filter performs very well. So we say
This gives us the process model for our system
This correctly has one equation for each variable in the state, isolated on the left hand side. We need to express this set of equations in the form
We can rewrite this in matrix form as
dt = 0.1
F = np.array([[1, dt],
[0, 1]])
F
array([[ 1. , 0.1],
[ 0. , 1. ]])
Let's test this! FilterPy has a predict
method that performs the prediction by computing dt = 0.1
, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged.
from filterpy.kalman import predict
x = np.array([10.0, 4.5])
P = np.diag([500, 49])
F = np.array([[1, dt], [0, 1]])
# Q is the process noise
x, P = predict(x=x, P=P, F=F, Q=0)
print('x =', x)
x = [ 10.45 4.5 ]
This worked. Note that the code does not distinguish between the prior and posterior in the variable names, so after calling predict the prior KalmanFilter.x
. If we call predict()
several times in a row the value will be updated each time.
for _ in range(4):
x, P = predict(x=x, P=P, F=F, Q=0)
print('x =', x)
x = [ 10.9 4.5]
x = [ 11.35 4.5 ]
x = [ 11.8 4.5]
x = [ 12.25 4.5 ]
KalmanFilter.predict()
computes both the mean and covariance of the innovation. This is the value of
print(P)
[[ 512.25 24.5 ]
[ 24.5 49. ]]
Inspecting the diagonals shows us that the position variance got larger. We've performed five prediction steps with no measurements, and our uncertainty grew. The off-diagonal elements became non-zero - the Kalman filter detected a correlation between position and velocity! The variance of the velocity did not change.
Here I plot the covariance before and after the prediction. The initial value is in solid red, and the prior (prediction) is in dashed black. I've altered the covariance and time step to better illustrate the change.
from filterpy.stats import plot_covariance_ellipse
dt = 0.3
F = np.array([[1, dt], [0, 1]])
x = np.array([10.0, 4.5])
P = np.diag([500, 500])
plot_covariance_ellipse(x, P, edgecolor='r')
x, P = predict(x, P, F, Q=0)
plot_covariance_ellipse(x, P, edgecolor='k', ls='dashed')
You can see that the center of the ellipse shifted by a small amount (from 10 to 10.90) because the position changed. The ellipse also elongated, showing the correlation between position and velocity. How does the filter compute new values for Q
to zero each time, so it is not due to me adding noise. It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground.
A quick review on process noise. A car is driving along the road with the cruise control on; it should travel at a constant speed. We model this with
We can model this system with the differential equation
where
We will learn how to go from a set of differential equations to the Kalman filter matrices in the Kalman Filter Math chapter. In this chapter we take advantage of the fact that Newton already derived the equations of motion for us. For now you just need to know that we account for the noise in the system by adding a process noise covariance matrix
The univariate Kalman filter used variance = variance + process_noise
to compute the variance for the variance of the prediction step. The multivariate Kalman filter does the same, essentially P = P + Q
. I say 'essentially' because there are other terms unrelated to noise in the covariance equation that we will see later.
Deriving the process noise matrix can be quite demanding, and we will put it off until the Kalman math chapter. For now know that
FilterPy provides functions which compute Q_discrete_white_noise
takes 3 parameters. dim
, which specifies the dimension of the matrix, dt
, which is the time step in seconds, and var
, the variance in the noise. Briefly, it discretizes the noise over the given time period under assumptions that we will discuss later. This code computes
from filterpy.common import Q_discrete_white_noise
Q = Q_discrete_white_noise(dim=2, dt=1., var=2.35)
print(Q)
[[ 0.588 1.175]
[ 1.175 2.35 ]]
The Kalman filter does not just filter data, it allows us to incorporate the control inputs of systems like robots and airplanes. Suppose we are controlling a robot. At each time step we would send steering and velocity signals to the robot based on its current position vs desired position. Kalman filter equations incorporate that knowledge into the filter equations, creating a predicted position based both on current velocity and control inputs to the drive motors. Remember, we never throw information away.
For a linear system the effect of control inputs can be described as a set of linear equations, which we can express with linear algebra as
Here
Therefore the complete Kalman filter equation for the prior mean is
and this is the equation that is computed when you call KalmanFilter.predict()
.
Your dog may be trained to respond to voice commands. All available evidence suggests that my dog has no control inputs whatsoever, so I set
B = 0. # my dog doesn't listen to me!
u = 0
x, P = predict(x, P, F, Q, B, u)
print('x =', x)
print('P =', P)
x = [ 12.7 4.5]
P = [[ 680.587 301.175]
[ 301.175 502.35 ]]
Setting predict
uses 0 for their default value.
Your job as a designer is to specify the matrices for
-
$\mathbf x$ ,$\mathbf P$ : the state and covariance -
$\mathbf F$ ,$\mathbf Q$ : the process model and noise covariance -
$\mathbf{B,u}$ : Optionally, the control input and function
Now we can implement the update step of the filter. You only have to supply two more matrices, and they are easy to understand.
The Kalman filter computes the update step in what we call measurement space. We mostly ignored this issue in the univariate chapter because of the complication it adds. We tracked our dog's position using a sensor that reported his position. Computing the residual was easy - subtract the filter's predicted position from the measurement:
We need to compute the residual because we scale it by the Kalman gain to get the new estimate.
What would happen if we were trying to track temperature using a thermometer that outputs a voltage corresponding to the temperature reading? The equation for the residual computation would be meaningless; you can't subtract a temperature from a voltage.
We need to convert the temperature into a voltage so we can perform the subtraction. For the thermometer we might write:
CELSIUS_TO_VOLTS = 0.21475
residual = measurement - (CELSIUS_TO_VOLTS * predicted_state)
The Kalman filter generalizes this problem by having you supply a measurement function that converts a state into a measurement.
Why are we working in measurement space? Why not work in state space by converting the voltage into a temperature, allowing the residual to be a difference in temperature?
We cannot do that because most measurements are not invertible. The state for the tracking problem contains the hidden variable
Both the measurement
where
We need to design
The residual equation will have the form
We will want to multiply the position
And so, for our Kalman filter we set
H = np.array([[1., 0.]])
We have designed the majority of our Kalman filter. All that is left is to model the noise in the sensors.
The measurement is implemented with
If we have two sensors or measurements we'd have:
The measurement noise matrix models the noise in our sensors as a covariance matrix. In practice this can be difficult. A complicated system may have many sensors, the correlation between them might not be clear, and usually their noise is not a pure Gaussian. For example, a sensor might be biased to read high if the temperature is high, and so the noise is not distributed equally on both sides of the mean. We will learn to deal with these problems later.
The Kalman filter equations uses a covariance matrix
If
If we had two position sensors, the first with a variance of 5 m$^2$, the second with a variance of 3 m$^2$, we would write
We put the variances on the diagonal because this is a covariance matrix, where the variances lie on the diagonal, and the covariances, if any, lie in the off-diagonal elements. Here we assume there is no correlation in the noise between the two sensors, so the covariances are 0.
For our problem we only have one sensor, so we can implement this as
R = np.array([[5.]])
We perform the update by calling update
.
from filterpy.kalman import update
z = 1.
x, P = update(x, P, z, R, H)
print('x =', x)
x = [ 1.085 -0.64 ]
Keeping track of all of these variables is burdomsome, so FilterPy also implements the filter with the class KalmanFilter
. I will use the class in the rest of this book, but I wanted you to see the procedural form of these functions since I know some of you are not fans of object oriented programming.
I've given you all of the code for the filter, but now let's collect it in one place. First we construct an KalmanFilter
object. We have to specify the number of variables in the state with the dim_x
parameter, and the number of measurements with dim_z
. We have two random variables in the state and one measurement, so we write:
from filterpy.kalman import KalmanFilter
dog_filter = KalmanFilter(dim_x=2, dim_z=1)
This creates an obect with default values for all the Kalman filter matrices:
from filterpy.kalman import KalmanFilter
dog_filter = KalmanFilter(dim_x=2, dim_z=1)
print('x = ', dog_filter.x.T)
print('R = ', dog_filter.R)
print('Q = \n', dog_filter.Q)
# etc...
x = [[ 0. 0.]]
R = [[ 1.]]
Q =
[[ 1. 0.]
[ 0. 1.]]
Now we initialize the filter's matrices and vectors with values valid for our problem. I've put this in a function to allow you to specify different initial values for R
, P
, and Q
and put it in a helper function. We will be creating and running many of these filters, and this saves us a lot of headaches.
from filterpy.kalman import KalmanFilter
from filterpy.common import Q_discrete_white_noise
def pos_vel_filter(x, P, R, Q=0., dt=1.0):
""" Returns a KalmanFilter which implements a
constant velocity model for a state [x dx].T
"""
kf = KalmanFilter(dim_x=2, dim_z=1)
kf.x = np.array([x[0], x[1]]) # location and velocity
kf.F = np.array([[1., dt],
[0., 1.]]) # state transition matrix
kf.H = np.array([[1., 0]]) # Measurement function
kf.R *= R # measurement uncertainty
if np.isscalar(P):
kf.P *= P # covariance matrix
else:
kf.P[:] = P # [:] makes deep copy
if np.isscalar(Q):
kf.Q = Q_discrete_white_noise(dim=2, dt=dt, var=Q)
else:
kf.Q[:] = Q
return kf
KalmanFilter
initializes R
, P
, and Q
to the identity matrix, so kf.P *= P
is one way to quickly assign all of the diagonal elements to the same scalar value. Now we create the filter:
dt = .1
x = np.array([0., 0.])
kf = pos_vel_filter(x, P=500, R=5, Q=0.1, dt=dt)
All that is left is to write the code to run the Kalman filter.
from code.mkf_internal import plot_track
def run(x0=(0.,0.), P=500, R=0, Q=0, dt=1.0,
track=None, zs=None,
count=0, do_plot=True, **kwargs):
"""
track is the actual position of the dog, zs are the
corresponding measurements.
"""
# Simulate dog if no data provided.
if zs is None:
track, zs = compute_dog_data(R, Q, count)
# create the Kalman filter
kf = pos_vel_filter(x0, R=R, P=P, Q=Q, dt=dt)
# run the kalman filter and store the results
xs, cov = [], []
for z in zs:
kf.predict()
kf.update(z)
xs.append(kf.x)
cov.append(kf.P)
xs, cov = np.array(xs), np.array(cov)
if do_plot:
plot_track(xs[:, 0], track, zs, cov,
dt=dt, **kwargs)
return xs, cov
This is the complete code for the filter, and most of it is boilerplate. I've made it flexible enough to support several uses in this chapter, so it is a bit verbose. Let's work through it line by line.
The first lines checks to see if you provided it with measurement data in data
. If not, it creates the data using the compute_dog_data
function we wrote earlier.
The next lines uses our helper function to create a Kalman filter.
# create the Kalman filter
kf = pos_vel_filter(x0, R=R, P=P, Q=Q, dt=dt)
All we need to do is perform the update and predict steps of the Kalman filter for each measurement. The KalmanFilter
class provides the two methods update()
and predict()
for this purpose. update()
performs the measurement update step of the Kalman filter, and so it takes a variable containing the sensor measurement.
Absent the work of storing the results, the loop reads:
for z in zs:
kf.predict()
kf.update(z)
Each call to predict
and update
modifies the state variables x
and P
. Therefore, after the call to predict
kf.x
contains the prior. After the call to update kf.x
contains the posterior. data
contains the actual position and measurement of the dog, so we use [:, 1]
to get an array of measurements.
It really cannot get much simpler than that. As we tackle more complicated problems this code will remain largely the same; all of the work goes into setting up the KalmanFilter
matrices; executing the filter is trivial.
The rest of the code optionally plots the results and then returns the saved states and covariances.
Let's run it. We have 50 measurements with a noise variance of 10 and a process variance of 0.01.
P = np.diag([500., 49.])
Ms, Ps = run(count=50, R=10, Q=0.01, P=P)
There is still a lot to learn, but we have implemented a Kalman filter using the same theory and equations as published by Rudolf Kalman! Code very much like this runs inside of your GPS, airliners, robots, and so on.
The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (labelled Track). After the initial settling in period the filter should track the dog's position very closely. The yellow shaded portion between the black dotted lines shows 1 standard deviations of the filter's variance, which I explain in the next paragraph.
The next two plots show the variance of
The covariance matrix
In the univariate chapter we filtered very noisy signals with much simpler code than the code above. However, realize that right now we are working with a very simple example - an object moving through 1-D space and one sensor. That is about the limit of what we can compute with the code in the last chapter. In contrast, we can implement very complicated, multidimensional filter with this code merely by altering our assignments to the filter's variables. Perhaps we want to track 100 dimensions in financial models. Or we have an aircraft with a GPS, INS, TACAN, radar altimeter, baro altimeter, and airspeed indicator, and we want to integrate all those sensors into a model that predicts position, velocity, and accelerations in 3D space. We can do that with the code in this chapter.
I want you to get a better feel for how the Gaussians change over time, so here is a 3D plot showing the Gaussians every 7th epoch (time step). Every 7th separates them enough so can see each one independently. The first Gaussian at
from book_format import set_figsize, figsize
from code.nonlinear_plots import plot_gaussians
P = np.diag([3., 1.])
np.random.seed(3)
Ms, Ps = run(count=25, R=10, Q=0.01, P=P, do_plot=False)
with figsize(x=9, y=5):
plot_gaussians(Ms[::7], Ps[::7], (-5,25), (-5, 5), 75)
We are now ready to learn how predict()
and update()
perform their computations.
A word about notation. I'm a programmer, and I am used to code that reads
x = x + 1
That is not an equation as the sides are not equal, but an assignment. If we wanted to write this in mathematical notation we'd write
Kalman filter equations are littered with subscripts and superscripts to keep the equations mathematically consistent. I find this makes them extremely hard to read. In most of the book I opt for subscriptless assignments. As a programmer you should understand that I am showing you assignments which implement an algorithm that is to be executed step by step. I'll elaborate on this once we have a concrete example.
The Kalman filter uses these equations to compute the prior - the predicted next state of the system. They compute the prior mean (
As a reminder, the linear equation
If
The equivalent univariate equation is
If you perform the matrix multiplication
This equation is not as easy to understand so we will spend more time on it.
In univariate version of this equation is:
We add the variance of the movement to the variance of our estimate to reflect the loss of knowlege. We need to do the same thing here, except it isn't quite that easy with multivariate Gaussians.
We can't simply write
Since we do not have perfect knowledge of the value of
The correct equation is
Expressions in the form
When we initialize
the value for
The initial value for
Another way to think of this is reflect on the
If you have some experience with linear algebra and statistics, this may help. The covariance due to the prediction can be modeled as the expected value of the error in the prediction step, given by this equation.
Of course,
Let's look at its effect. Here I use
import matplotlib.pyplot as plt
import numpy as np
dt = 0.6
x = np.array([0., 5.])
F = np.array([[1., dt], [0, 1.]])
P = np.array([[1.5, 0], [0, 3.]])
plot_covariance_ellipse(x, P, edgecolor='r')
for _ in range(5):
x = np.dot(F, x)
P = np.dot(F, P).dot(F.T)
plot_covariance_ellipse(x, P, edgecolor='k', ls='dashed')
book_plots.set_labels(x='position', y='velocity')
You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information about the position due to adding
Here is an animation of this equation that allows you to change the design of F00
slider affects the value of F[0, 0]. covar
sets the intial covariance between the position and velocity(
- what if
$x$ is not correlated to$\dot x$ ? (set F01 to 0, the rest at defaults) - what if
$x = 2\dot x\Delta t + x_0$ ? (set F01 to 2, the rest at defaults) - what if
$x = \dot x\Delta t + 2x_0$ ? (set F00 to 2, the rest at defaults) - what if
$x = \dot x\Delta t$ ? (set F00 to 0, the rest at defaults)
from IPython.html.widgets import interact, interactive, fixed
from IPython.html.widgets import IntSlider, FloatSlider
set_figsize(9, 4)
def plot_FPFT(F00, F01, F10, F11, covar):
plt.figure()
dt = 1.
x = np.array((0, 0.))
P = np.array(((1, covar), (covar, 2)))
F = np.array(((F00, F01), (F10, F11)))
plot_covariance_ellipse(x, P)
plot_covariance_ellipse(x, np.dot(F, P).dot(F.T), ec='r')
plt.gca().set_aspect('equal')
plt.xlim(-4, 4)
plt.ylim(-4, 4)
#plt.title(str(F))
plt.xlabel('position')
plt.ylabel('velocity')
interact(plot_FPFT,
F00=IntSlider(value=1, min=0, max=2., continuous_update=False),
F01=FloatSlider(value=1, min=0., max=2.,
description='F01(dt)', continuous_update=False),
F10=FloatSlider(value=0, min=0., max=2., continuous_update=False),
F11=FloatSlider(value=1, min=0., max=2., continuous_update=False),
covar=FloatSlider(value=0, min=0, max=1., continuous_update=False));
(If you are reading this in a static form: instructions to run this online are here: https://git.io/vza7b). Or, go to binder using the link below, and open this notebook from there.
http://mybinder.org/repo/rlabbe/Kalman-and-Bayesian-Filters-in-Python
The update equations look messier than the predict equations, but that is mostly due to the Kalman filter computing the update in measurement space. This is because measurements are not invertible. For example, consider a sensor that gives the range to a target. It is impossible to convert a range into a position - an infinite number of positions in a circle will yield the same range. On the other hand, we can always compute the range (measurement) given a position (state).
Before I continue, recall that we are trying to do something very simple: choose a new estimate chosen somewhere between a measurement and a prediction, as in this chart:
The equations will be complicated because the state has multiple dimensions, but this operations is what we are doing. Don't let the equations distract you from the simplicity of this idea.
To work in measurement space the Kalman filter has to project the covariance matrix into measurement space. The math for this is
You should recognize this
Once the covariance is in measurement space we need to account for the sensor noise. This is very easy - we just add matrices. The result is variously called the system uncertainty or innovation covariance.
If you ignore the
Compare the equations for the system uncertainty and the covariance
In each equation
Look back at the residual diagram. Once we have a prediction and a measurement we need to select an estimate somewhere between the two. If we have more certainty about the measurement the estimate will be closer to it. If instead we have more certainty about the prediction then the estimate will be closer to it.
In the univariate chapter we scaled the mean using this equation
which we simplified to
which gave us
For the multivariate Kalman filter
$$\begin{aligned} \mathbf K &\approx \frac{\mathbf{\bar P}\mathbf H^\mathsf T}{\mathbf{S}} \ \mathbf K &\approx \frac{\mathsf{uncertainty}\mathsf{prediction}}{\mathsf{uncertainty}\mathsf{measurement}}\mathbf H^\mathsf T \end{aligned}$$
The Kalman gain equation computes a ratio based on how much we trust the prediction vs the measurement. We did the same thing in every prior chapter. The equation is complicated because we are doing this in multiple dimensions via matrices, but the concept is simple. The
This is an easy one as we've covered this equation while designing the measurement function
The univariate equation is
and clearly computes the same thing, but only in one dimension.
We select our new state to be along the residual, scaled by the Kalman gain. The scaling is performed by
$$\begin{aligned} \mathbf x &= \mathbf{\bar x} + \mathbf{Ky} \ &= \mathbf{\bar x} + \mathbf{\bar PH}^\mathsf T \mathbf{S}^{-1}\mathbf y \ &\approx \mathbf{\bar x} + \frac{\mathsf{uncertainty}\mathsf{prediction}}{\mathsf{uncertainty}\mathsf{measurement}}\mathbf H^\mathsf T\mathbf y \end{aligned}$$
This equation can be numerically unstable and I don't use it in FilterPy. Later I'll share more complicated but numerically stable forms of this equation.
FilterPy hides the details of the implementation from us. Normally you will appreciate this, but let's implement the last filter without FilterPy. To do so we need to define our matrices as variables, and then implement the Kalman filter equations explicitly.
Here we initialize our matrices:
dt = 1.
R_var = 10
Q_var = 0.01
x = np.array([[10.0, 4.5]]).T
P = np.diag([500, 49])
F = np.array([[1, dt],
[0, 1]])
H = np.array([[1., 0.]])
R = np.array([[R_var]])
Q = Q_discrete_white_noise(dim=2, dt=dt, var=Q_var)
from numpy import dot
from scipy.linalg import inv
count = 50
track, zs = compute_dog_data(R_var, Q_var, count)
xs, cov = [], []
for z in zs:
# predict
x = dot(F, x)
P = dot(F, P).dot(F.T) + Q
#update
S = dot(H, P).dot(H.T) + R
K = dot(P, H.T).dot(inv(S))
y = z - dot(H, x)
x += dot(K, y)
P = P - dot(K, H).dot(P)
xs.append(x)
cov.append(P)
xs, cov = np.array(xs), np.array(cov)
plot_track(xs[:, 0], track, zs, cov, plot_P=False, dt=dt)
The results are identical to the FilterPy version. Which you prefer is up to you. I prefer not polluting my namespace with variables such as x
, P
, and so on; dog_filter.x
is, to me, more readable.
More importantly, this example requires you to remember and program the equations for the Kalman filter. Sooner or later you will make a mistake. FilterPy's version ensures that your code will be correct. On the other hand, if you make a mistake in your definitions, such as making
FilterPy's KalmanFilter class provides additional functionality such as smoothing, batch processing, faded memory filtering, computation of the maximum likelihood function, and more. You get all of this functionality without having to explicitly program it.
We have learned the Kalman filter equations. Here they are all together for your review. There was a lot to learn, but I hope that as you went through each you recognized it's kinship with the equations in the univariate filter. In the Kalman Math chapter I will show you that if we set the dimension of
I want to share a form of the equations that you will see in the literature. There are many different notation systems used, but this gives you an idea of what to expect.
$$ \begin{aligned} \hat{\mathbf x}{k\mid k-1} &= \mathbf F_k\hat{\mathbf x}{k-1\mid k-1} + \mathbf B_k \mathbf u_k \ \mathbf P_{k\mid k-1} &= \mathbf F_k \mathbf P_{k-1\mid k-1} \mathbf F_k^\mathsf T + \mathbf Q_k \ \tilde{\mathbf y}k &= \mathbf z_k - \mathbf H_k\hat{\mathbf x}{k\mid k-1}\ \mathbf{S}k &= \mathbf H_k \mathbf P{k\mid k-1} \mathbf H_k^\mathsf T + \mathbf R_k \ \mathbf K_k &= \mathbf P_{k\mid k-1}\mathbf H_k^\mathsf T \mathbf{S}k^{-1}\ \hat{\mathbf x}{k\mid k} &= \hat{\mathbf x}{k\mid k-1} + \mathbf K_k\tilde{\mathbf y}k\ \mathbf P{k|k} &= (I - \mathbf K_k \mathbf H_k) \mathbf P{k|k-1} \\end{aligned} $$
This notation uses the Bayesian
This notation, copied from Wikipedia [1], allows a mathematician to express himself exactly. In formal publications presenting new results this precision is necessary. As a programmer I find it fairly unreadable. I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format in the literature, so each author makes different choices. I find it challenging to switch quickly between books and papers, and so have adopted my admittedly less precise notation. Mathematicians may write scathing emails to me, but I hope programmers and students will rejoice at my simplified notation.
The Symbology Appendix lists the notation used by various authors. This brings up another difficulty. Different authors use different variable names.
If you are a programmer trying to understand a paper's equations, I suggest starting by removing all of the superscripts, subscripts, and diacriticals, replacing them with a single letter. If you work with equations like this every day this is superfluous advice, but when I read I am usually trying to understand the flow of computation. To me it is far more understandable to remember that
Exercise: Show Effect of Hidden Variables
In our filter velocity is a hidden variable. How would a filter perform if we did not use velocity in the state?
Write a Kalman filter that uses the state $\mathbf x=\begin{bmatrix}x\end{bmatrix}$ and compare it against a filter that uses $\mathbf x=\begin{bmatrix}x & \dot x\end{bmatrix}^\mathsf T$.
# your code here
We've already implemented a Kalman filter for position and velocity, so I will provide the code without much comment, and then plot the result.
from math import sqrt
from numpy.random import randn
def univariate_filter(x0, P, R, Q):
f = KalmanFilter(dim_x=1, dim_z=1, dim_u=1)
f.x = np.array([[x0]])
f.P *= P
f.H = np.array([[1.]])
f.F = np.array([[1.]])
f.B = np.array([[1.]])
f.Q *= Q
f.R *= R
return f
def plot_1d_2d(xs, xs1d, xs2d):
plt.plot(xs1d, label='1D Filter')
plt.scatter(range(len(xs2d)), xs2d, c='r', label='2D Filter')
plt.plot(xs, ls='--', color='k', lw=1, label='track')
plt.title('State')
plt.legend(loc=4)
plt.show()
def compare_1D_2D(x0, P, R, Q, vel, u=0):
# storage for filter output
xs, xs1, xs2 = [], [], []
# 1d KalmanFilter
f1D = univariate_filter(x0, P, R, Q)
#2D Kalman filter
f2D = pos_vel_filter(x=(x0, vel), P=P, R=R, Q=0)
pos = 0 # true position
for i in range(100):
pos += vel
xs.append(pos)
# control input u - discussed below
f1D.predict(u=u)
f2D.predict()
z = pos + randn()*sqrt(R) # measurement
f1D.update(z)
f2D.update(z)
xs1.append(f1D.x[0])
xs2.append(f2D.x[0])
plt.figure()
plot_1d_2d(xs, xs1, xs2)
compare_1D_2D(x0=0, P=50., R=5., Q=.02, vel=1.)
The filter that incorporates velocity into the state produces much better estimates than the filter that only tracks position. The univariate filter has no way to estimate the velocity or change in position, so it lags the tracked object.
In the univarate Kalman filter chapter we had a control input u
to the predict equation:
def predict(self, u=0.0):
self.x += u
self.P += self.Q
Let's try using it:
compare_1D_2D(x0=0, P=50., R=5., Q=.02, vel=1., u=1.)
Here the performance of the two filters are similar, and perhaps the univariate filter is tracking more cloesly. But let's see what happens when the actual velocity vel
is different from the control input u
:
compare_1D_2D(x0=0, P=50., R=5., Q=.02, vel=-2., u=1.)
If we are tracking a robot which we are also controlling the univariate filter can do a very good job because the control input allows the filter to make an accurate prediction. But if we are tracking passively the control input is not much help unless we can make an accurate apriori guess as to the velocity. This is rarely possible.
I haven't explained how the filter computes the velocity, or any hidden variable. If we plug in the values we calculated for each of the filter's matrices we can see what happens.
First we need to compute the system uncertainty.
Now that we have
In other words, the Kalman gain for
This should be very familiar to you from the univariate case.
The Kalman gain for
What is the effect of this? Recall that we compute the state as
Here the residual
Which gives this system of equations:
The prediction
To bring this full circle,
In summary, these linear algebra equations may be unfamiliar to you, but computation is actually very simple. It is essentially the same computation that we performed in the g-h filter. Our constants are different in this chapter because we take the noise in the process model and sensors into account, but the math is the same.
Let's start varying our parameters to see the effect of various changes. This is a very normal thing to be doing with Kalman filters. It is difficult, and often impossible to exactly model our sensors. An imperfect model means imperfect output from our filter. Engineers spend a lot of time tuning Kalman filters so that they perform well with real world sensors. We will spend time now to learn the effect of these changes. As you learn the effect of each change you will develop an intuition for how to design a Kalman filter. Designing a Kalman filter is as much art as science. We are modeling a physical system using math, and models are imperfect.
Let's look at the effects of the measurement noise
from numpy.random import seed
seed(2)
trk, zs = compute_dog_data(z_var=225, process_var=.02, count=50)
run(track=trk, zs=zs, R=225, Q=200, P=P, plot_P=False,
title='R_var = 225 $m^2$, Q_var = 20 $m^2$')
run(track=trk, zs=zs, R=225, Q=.02, P=P, plot_P=False,
title='R_var = 225 $m^2$, Q_var = 0.02 $m^2$');
The filter in the first plot should follow the noisy measurement closely. In the second plot the filter should vary from the measurement quite a bit, and be much closer to a straight line than in the first graph. Why does
Let's remind ourselves of what the term process uncertainty means. Consider the problem of tracking a ball. We can accurately model its behavior in a vacuum with math, but with wind, varying air density, temperature, and an spinning ball with an imperfect surface our model will diverge from reality.
In the first case we set Q_var=20 m^2
, which is quite large. In physical terms this is telling the filter "I don't trust my motion prediction step" as we are saying that the variance in the velocity is 20. Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with
In the second case we set Q_var=0.02 m^2
, which is quite small. In physical terms we are telling the filter "trust the prediction, it is really good!". More strictly this actually says there is very small amounts of process noise (variance 0.02
Now let's set Q_var
to R_var
up to
run(track=trk, zs=zs, R=10000, Q=.2, P=P, plot_P=False,
title='R=$10,000\, m^2$, Q=$.2\, m^2$');
The effect of this can be subtle. We have created an suboptimal filter because the actual measurement noise variance is 225
run(track=trk, zs=zs, R=10000, Q=.2, P=P, plot_P=False,
x0=np.array([50., 1.]),
title='R=$10,000\, m^2$, Q=$.2\, m^2$');
Here we can see that the filter cannot acquire the track. This happens because even though the filter is getting reasonably good measurements it assumes that the measurements are bad, and eventually predicts forward from a bad position at each step. If you think that perhaps that bad initial position would give similar results for a smaller measurement noise, let's set it back to the correct value of 225
run(track=trk, zs=zs, R=225, Q=.2, P=P, plot_P=False,
x0=np.array([20., 1.]),
title='R=$225\, m^2$, Q=$.2\, m^2$');
Here we see that the filter initially struggles for several iterations to acquire the track, but then it accurately tracks our dog. In fact, this is nearly optimum - we have not designed
To some extent you can get similar looking output by varying either
Let's start by revisiting plotting a track. I will hard code the data and noise to avoid being at the mercy of the random number generator, which might generate data that does not illustrate what I want to talk about. I will start with P=500
.
import code.mkf_internal as mkf_internal
var = 27.5
data = mkf_internal.zs_var_275()
run(track=trk, zs=zs, R=var, Q=.02, P=500., plot_P=True,
title='$P=500\, m^2$');
Looking at the output we see a very large spike in the filter output at the beginning. We set x
is
Let's look at the math behind this. The equation for the Kalman gain is
$$\mathbf K = \mathbf{\bar P} \mathbf H^\mathsf T\mathbf{S}^{-1} \approx \frac{\mathbf{\bar P}\mathbf H^\mathsf T}{\mathbf{S}} \approx \frac{\mathsf{uncertainty}\mathsf{prediction}}{\mathsf{uncertainty}\mathsf{measurement}}\mathbf H^\mathsf T $$
It is a ratio of the uncertainty of the prediction vs measurement. Here the uncertainty in the prediction is large, so
Now let us see the effect of a smaller initial value of
run(track=trk, zs=zs, R=var, Q=.02, P=1., plot_P=True,
title='$P=1\, m^2$');
This looks good at first blush. The plot does not have the spike that the former plot did; the filter starts tracking the measurements and doesn't take any time to settle to the signal. However, if we look at the plots for P you can see that there is an initial spike for the variance in position, and that it never really converges. Poor design leads to a long convergence time, and suboptimal results.
So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to use a small
Let's see the result of a bad initial estimate coupled with a very small P=1
m$^2$. This is clearly an incorrect value for
x = np.array([100., 0.])
run(track=trk, zs=zs, R=var, Q=.02, P=1., x0=x,
plot_P=False, title='$P=1\, m^2$');
We can see that the initial estimates are terrible and that it takes the filter a long time to start converging onto the signal . This is because we told the Kalman filter that we strongly believe in our initial estimate of 100 m and were incorrect in that belief.
Now, let's provide a more reasonable value for P
and see the difference.
x = np.array([100., 0.])
run(track=trk, zs=zs, R=var, Q=.02, P=500., x0=x,
plot_P=False, title='$P=500\, m^2$');
In this case the Kalman filter is very uncertain about the initial state, so it converges onto the signal much faster. It is producing good output after only 5 to 6 epochs. With the theory we have developed so far this is about as good as we can do. However, this scenario is a bit artificial; if we do not know where the object is when we start tracking we do not initialize the filter to some arbitrary value, such as 0 m or 100 m. I address this in the Filter Initialization section below.
Let's do another Kalman filter for our dog, and this time plot the covariance ellipses on the same plot as the position.
from code.mkf_internal import plot_track_ellipses
def plot_covariances(count, R, Q=0, P=20., title=''):
track, zs = compute_dog_data(R, Q, count)
f = pos_vel_filter(x=(0., 0.), R=R, Q=Q, P=P)
xs, cov = [], []
for z in zs:
f.predict()
f.update(z)
xs.append(f.x[0])
cov.append(f.P)
plot_track_ellipses(count, zs, xs, cov, title)
plt.subplot(121)
plot_covariances(R=5, Q=.02, count=20, title='$R = 5\, m^2$')
plt.subplot(122)
plot_covariances(R=.1, Q=.02, count=20, title='$R = 0.5\, m^2$')
If you are viewing this in Jupyter Notebook or on the web, here is an animation of the filter filtering the data. I've tuned the filter parameters such that it is easy to see a change in
The output on these is a bit messy, but you should be able to see what is happening. In both plots we are drawing the covariance matrix for each point. We start with the covariance $\mathbf P=(\begin{smallmatrix}20&0\0&20\end{smallmatrix})$, which signifies a lot of uncertainty about our initial belief. After we receive the first measurement the Kalman filter updates this belief, and so the variance is no longer as large. In the top plot the first ellipse (the one on the far left) should be a slightly squashed ellipse. As the filter continues processing the measurements the covariance ellipse quickly shifts shape until it settles down to being a long, narrow ellipse tilted in the direction of movement.
Think about what this means physically. The x-axis of the ellipse denotes our uncertainty in position, and the y-axis our uncertainty in velocity. So, an ellipse that is taller than it is wide signifies that we are more uncertain about the velocity than the position. Conversely, a wide, narrow ellipse shows high uncertainty in position and low uncertainty in velocity. Finally, the amount of tilt shows the amount of correlation between the two variables.
The first plot, with
In contrast, the second plot, with R=0.5
Why are the ellipses for
The x-axis is for position, and x-axis is velocity. An ellipse that is vertical, or nearly so, says there is no correlation between position and velocity, and an ellipse that is diagonal says that there is a lot of correlation. Phrased that way, the results sound unlikely. The tilt of the ellipse changes, but the correlation shouldn't be changing over time. But this is a measure of the output of the filter, not a description of the actual, physical world. When
This is a critical point to understand!. The Kalman filter is a mathematical model for a real world system. A report of little correlation does not mean there is no correlation in the physical system, just that there was no linear correlation in the mathematical model. It's a report of how much measurement vs prediction was incorporated into the model.
Let's bring that point home with a truly large measurement error. We will set
plot_covariances(R=200., Q=.2, count=5, title='$R = 200\, m^2$')
I hope the result was what you were expecting. The ellipse quickly became very wide and not very tall. It did this because the Kalman filter mostly used the prediction vs the measurement to produce the filtered result. We can also see how the filter output is slow to acquire the track. The Kalman filter assumes that the measurements are extremely noisy, and so it is very slow to update its estimate for
Keep looking at these plots until you grasp how to interpret the covariance matrix
When plotting covariance ellipses, make sure to always use ax.set_aspect('equal') or plt.axis('equal') in your code (the former lets you set the xlim and ylim values). If the axis use different scales the ellipses will be drawn distorted. For example, the ellipse may be drawn as being taller than it is wide, but it may actually be wider than tall.
There are many schemes for initializing the filter. The following approach performs well in most situations. In this scheme you do not initialize the filter until you get the first measurement,
We know
Hence,
Matrix inversion requires a square matrix, but scipy.linalg.pinv
, so your code might look like
import scipy.linalg as la
H = np.array([[1, 0.]])
z0 = 3.2
x = np.dot(la.pinv(H), z0)
print(x)
[[ 3.2]
[ 0. ]]
Specialized knowledge of your problem domain may lead you to a different computation, but this is one way to do it. For example, if the state includes velocity you might take the first two measurements of position, compute the difference, and use that as the initial velocity.
Now we need to compute a value for
The diagonal of
You really need to understand the domain in which you are working and initialize your filter on the best available information. For example, suppose we were trying to track horses in a horse race. The initial measurements might be very bad, and provide you with a position far from the starting gate. We know that the horse must start at the starting gate; initializing the filter to the initial measurement would lead to suboptimal results. In this scenario we would want to always initialize the Kalman filter with the starting gate position of the horse.
The Kalman filter is designed as a recursive algorithm - as new measurements come in we immediately create a new estimate. But it is very common to have a set of data that have been already collected which we want to filter. Kalman filters can be run in a batch mode, where all of the measurements are filtered at once. We have implemented this in KalmanFilter.batch_filter()
. Internally, all the function does is loop over the measurements and collect the resulting state and covariance estimates in arrays. It simplifies your logic and conveniently gathers all of the outputs into arrays.
First collect your measurements into an array or list. Maybe it is in a CSV file:
zs = read_altitude_from_csv('altitude_data.csv')
Or maybe you will generate it using a generator:
zs = [some_func(i) for i in range(1000)]
Then call the batch_filter()
method.
Xs, Ps, Xs_pred, Ps_pred = kfilter.batch_filter(zs)
The function takes the list/array of measurements, filters it, and returns an NumPy array of state estimates (Xs), covariance matrices (Ps), and the predictions for the same (Xs_pred, Ps_pred).
Here is a complete example.
count = 50
track, zs = compute_dog_data(10, .2, count)
P = np.diag([500., 49.])
f = pos_vel_filter(x=(0., 0.), R=3., Q=.02, P=P)
xs, _, _, _ = f.batch_filter(zs)
book_plots.plot_measurements(range(1, count + 1), zs)
book_plots.plot_filter(range(1, count + 1), xs[:, 0])
plt.legend(loc='best');
This book includes a chapter on smoothing the results; I will not repeat the information here. However, it is so easy to use, and offers such a profoundly improved output that I will tease you with a few examples. The smoothing chapter is not especially difficult; you are sufficiently prepared to read it now.
Let's assume that we are tracking a car that has been traveling in a straight line. We get a measurement that implies that the car is starting to turn to the left. The Kalman filter moves the state estimate somewhat towards the measurement, but it cannot judge whether this is a particularly noisy measurement or the true start of a turn.
However, if we have future measurements we can decide if a turn was made. Suppose the subsequent measurements all continue turning left. We can then be sure that that a turn was initiated. On the other hand, if the subsequent measurements continued on in a straight line we would know that the measurement was noisy and should be mostly ignored. Instead of making an estimate part way between the measurement and prediction the estimate will either fully incorporate the measurement or ignore it, depending on what the future measurements imply about the object's movement.
KalmanFilter
implements a form of this algorithm which is called an RTS smoother, named after the inventors of the algorithm: Rauch, Tung, and Striebel. The method is rts_smoother()
. To use it pass in the means and covariances computed from the batch_filter
step, and receive back the smoothed means, covariances, and Kalman gain.
from numpy.random import seed
count = 50
seed(8923)
P = np.diag([500., 49.])
f = pos_vel_filter(x=(0., 0.), R=3., Q=.02, P=P)
track, zs = compute_dog_data(3., .02, count)
Xs, Covs, _, _ = f.batch_filter(zs)
Ms, Ps, _ = f.rts_smoother(Xs, Covs)
book_plots.plot_measurements(zs)
plt.plot(Xs[:, 0], ls='--', label='Kalman Position')
plt.plot(Ms[:, 0], label='RTS Position')
plt.legend(loc=4);
This output is fantastic! Two things are very apparent to me in this chart. First, the RTS smoother's output is much smoother than the KF output. Second, it is almost always more accurate than the KF output (we will examine this claim in detail in the Smoothing chapter). The improvement in the velocity, which is an hidden variable, is even more dramatic:
plt.plot(xs[:, 1], ls='--', label='Kalman Velocity')
plt.plot(Ms[:, 1], label='RTS Velocity')
plt.legend(loc=4)
plt.gca().axhline(1, lw=1, c='k');
We will explore why this is so in the next exercise.
Since we are plotting velocities let's look at what the the 'raw' velocity is, which we can compute by subtracting subsequent measurements. i.e the velocity at time 1 can be approximated by xs[1] - xs[0]
. Plot the raw value against the values estimated by the Kalman filter and by the RTS filter. Discuss what you see.
# your code here
dx = np.diff(xs[:, 0], axis=0)
plt.scatter(range(1,count), dx, facecolor='none', edgecolor='k',
lw=2, label='Raw velocity')
plt.plot(xs[:, 1], ls='--', label='Filter')
plt.plot(Ms[:, 1], label='RTS')
plt.xlim(0, 60)
plt.legend(loc=4);
We see that the noise swamps the signal, causing the raw values to be essentially worthless. The filter is maintaining a separate estimate for velocity. The Kalman gain
I show this to reiterate the importance of using Kalman filters to compute velocities, accelerations, and even higher order values. I use a Kalman filter even when my measurements are so accurate that I am willing to use them unfiltered because it allows me accurate estimates for velocities and accelerations.
Multivariate Gaussians allow us to simultaneously handle multiple dimensions, both spacial and others (velocity, etc). We made a key insight: hidden variables have the ability to significantly increase the accuracy of the filter. This is possible because the hidden variables are correlated with the observed variables.
I gave an intuitive definition of observability. Observability was invented by Dr. Kalman for linear systems, and there is a fair amount of theory behind it. It answers the question of whether a system state can be determined by observing the system's output. For our problems this has been easy to determine, but more complicated systems may require rigorous analysis. Wikipedia's Observability article has an overview; if you need to learn the topic [Grewal2008] is a good source.
There is one important caveat about hidden variables. It is easy to construct a filter that produces estimates for hidden variables. I could write a filter that estimates the color of a tracked car. But there is no way to compute car color from positions, so the estimate for the color will be nonsense. The designer must verify that these variables are being estimated correctly. If you do not have a velocity sensor and yet are estimating velocity, you will need to test that the velocity estimates are correct.; do not trust that they are. For example, suppose the velocity has a periodic component to it - it looks like a sine wave. If your sample time is less than 2 times the frequency you will not be able to accurately estimate the velocity (due to Nyquist's Theorem). Imagine that the sample period is equal to the frequency of the velocity. The filter will report that the velocity is constant because it samples the system at the same point on the sin wave.
Initialization poses a particularly difficult problem for hidden variables. If you start with a bad initialization the filter can usually recover the observed variables, but may struggle and fail with the hidden one. Estimating hidden variables is a powerful tool, but a dangerous one.
I established a series of steps for designing a Kalman filter. These are not a usual part of the Kalman filter literature, and are only meant as a guide, not a prescription. Designing for a hard problem is an iterative process. You make a guess at the state vector, work out what your measurement and state models are, run some tests, and then alter the design as necessary.
The design of
The case with
These problems led some researchers and engineers to derogatorily call the Kalman filter a 'ball of mud'. In other words, it doesn't always hold together so well. Another term to know - Kalman filters can become smug. Their estimates are based solely on what you tell it the noises are. Those values can lead to overly confident estimates.
The Kalman filter is a mathematical model of the world. The output is only as accurate as that model. To make the math tractable we had to make some assumptions. We assume that the sensors and motion model have Gaussian noise. We assume that everything is linear. If that is true, the Kalman filter is optimal in a least squares sense. This means that there is no way to make a better estimate than what the filter gives us. However, these assumption are almost never true, and hence the model is necessarily limited, and a working filter is rarely optimal.
In later chapters we will deal with the problem of nonlinearity. For now I want you to understand that designing the matrices of a linear filter is an experimental procedure more than a mathematical one. Use math to establish the initial values, but then you need to experiment. If there is a lot of unaccounted noise in the world (wind, etc) you may have to make
- [1] 'Kalman Filters'. Wikipedia https://en.wikipedia.org/wiki/Kalman_filter#Details
- [Grewal2008] Grewal, Mohinder S., Andrews, Angus P. Kalman Filtering: Theory and Practice Using MATLAB. Third Edition. John Wiley & Sons. 2008.