Kalman Folding 5: Non-Linear Models and the EKF (Review Draft)

1 Abstract

We exhibit a foldable Extended Kalman Filter that internally integrates non-linear equations of motion with a nested fold of generic integrators over lazy streams in constant memory. Functional form allows us to switch integrators easily and to diagnose filter divergence accurately, achieving orders of magnitude better speed than the source example from the literature. As with all Kalman folds, we can move the vetted code verbatim, without even recompilation, from the lab to the field.

2 Background and Synopsis

In Kalman Folding, Part 1,^klfl we present basic, static Kalman filtering as a functional fold, highlighting the unique advantages of this form for deploying test-hardened code verbatim in harsh, mission-critical environments.

In Kalman Folding 2: Tracking,^klf2 we reproduce a tracking example from the literature, showing that these advantages extend to linear models with time-evolving states.

Here, we extend that example to include aerodynamic drag, making the model nonlinear. We must change the Kalman filters themselves to handle such problems. The resulting filters are called Extended Kalman Filters or EKFs.

The particular EKF designed here features internal integration of the non-linear equations of motion. We integrate these equations by folding over a lazy stream that generates, on demand, differential updates to the solution. Folds over lazy streams were introduced in Kalman Folding 4: Streams and Observables,^klf4 where we used them to step a static Kalman filter over observations. Here, lazy streams afford two advantages:

constant-memory solutions required for EKFs in embedded systems in the field
easy switching of integrators, even at run time, allowing accurate diagnosis and tuning of the filter

We show these advantages at work by improving Zarchan and Musoff’s^zarc example of tracking the height of a falling object with drag. These authors exhibit a sequence of filter tunings to converge the filter, ending with a second-order Runge-Kutta stepped at $100$ times the observation frequency. Integration dominates the run-time performance of their example. Because their integration code is enmeshed with the filter and with data delivery, they were not easily able to experiment with alternative integrators and missed the opportunity to try fourth-order Runge-Kutta, which we find converges the filter $100$ times faster.

Other papers in this series feature applications of EKFs to a variety of problems including navigation and pursuit.

In this series of papers, we use the Wolfram language^wolf because it supports functional programming and it excels at concise expression of mathematical code. All examples in these papers can be directly transcribed to any modern mainstream language that supports closures. For example, it is easy to write them in C++11 and beyond, Python, any modern Lisp, not to mention Haskell, Scala, Erlang, and OCaml. Many can be written without full closures; function pointers will suffice, so they are easy to write in C. It’s also not difficult to add extra arguments to simulate just enough closure-like support in C to write the rest of the examples in that language.

3 Linear Kalman Accumulator with Time Evolution

In Kalman Folding 2: Tracking,^klf2 we found the following accumulator function for a fold that implements the linear dynamic Kalman filter, that is, a filter that can track states that evolve with time^time according to a linear transformation of the state:

\noindent where

\noindent and all quantities are matrices:

$\mathbold{z}$ is a ${b}×{1}$ column vector containing one multidimensional observation
$\mathbold{x}$ and $\mathbold{x}₂$ are ${n}×{1}$ column vectors of model states
$\mathbold{Z}$ is a ${b}×{b}$ matrix, the covariance of observation noise
$\mathbold{P}$ and $\mathbold{P}_2$ are ${n}×{n}$ matrices, the theoretical covariances of $\mathbold{x}$ and $\mathbold{x}_2$, respectively
$\mathbold{A}$ is a ${b}×{n}$ matrix, the observation partials
$\mathbold{D}$ is a ${b}×{b}$ matrix, the Kalman denominator
$\mathbold{K}$ is an ${n}×{b}$ matrix, the Kalman gain
$\mathbold{Ξ}$ is an $n×{n}$ integral of process noise $\mathbold{ξ}$, namely \newline $∫₀^{δ t}\mathbold{Φ}(τ)⋅{\left(\begin{array}{c|c}\mathbold{0}&\mathbold{0}\\hline\mathbold{0}&E\left[\mathbold{ξ}\mathbold{ξ}^\intercal\right]\end{array}\right)⋅\mathbold{Φ}(τ)^\intercal\,\textrm{d}τ}$
$\mathbold{Φ}$ is the non-dimensional $n×{n}$ propagator for $\mathbold{F}$, namely $e^\mathbold{F\,{δ t}}$
$\mathbold{Γ}$ is an $n×{dim(\mathbold{u})}$ integral of system response, namely $∫₀^{δ t}{\mathbold{Φ}(τ) ⋅ \mathbold{G}\,\textrm{d}τ}$
$\mathbold{u}$ is a vector of external disturbances or control inputs
$δ{t}$ is an increment of time (or, more generally, the independent variable of the problem)

\noindent and the time-evolving states satisfy the following differential equation in state-space form:

\noindent $\mathbold{F}$, $\mathbold{G}$, and $\mathbold{u}$ may depend on time, but not on $\mathbold{x}$; that is the meaning of “linear dynamic” in this context.

3.1 Dimensional Arguments

In physical or engineering applications, these quantities carry physical dimensions of units of measure in addition to their matrix dimensions as numbers of rows and columns. Both kinds of dimensions are aspects of the type of a quantity. Dimensional arguments, like type-arguments more generally, are invaluable for checking equations and code.

If the physical and matrix dimensions of $\mathbold{x}$ are $\left[\left[\mathbold{x}\right]\right] \stackrel{\text{\tiny def}}{=} (\mathcal{X}, n×{1})$, of $\mathbold{z}$ are $\left[\left[\mathbold{z}\right]\right] \stackrel{\text{\tiny def}}{=} (\mathcal{Z}, b×{1})$, and of $δ{t}$ are $\left[\left[δ{t}\right]\right] \stackrel{\text{\tiny def}}{=} (\mathcal{T}, \textrm{scalar})$, then

The matrices $\mathbold{F}$, $\mathbold{Φ}$, $\mathbold{G}$, and $\mathbold{Γ}$ do not have single dimensions on their own, but their dimensions in various combinations with other matrices make sense. Elements of matrix expressions for $\mathbold{Ξ}$ include sufficient implicit physical dimensions to make its overall physical dimensions work out to $\mathcal{X}^2$. Detailed dimensional analysis of these matrices is the subject of another paper in this series.

\noindent In all examples in this paper, the observations $\mathbold{z}$ are $1×{1}$ matrices, equivalent to scalars, so $b=1$, but the theory and code carry over to multi-dimensional vector observations.

4 Tracking with Drag

To accommodate nonlinear equations of state evolution, we replace equation \ref{eqn:state-propagation-equation} with explicit numerical integration. The rest of the EKF uses equations \ref{eqn:covariance-propagation-equation}, \ref{eqn:kalman-gain-definition}, and \ref{eqn:kalman-denominator-definition}: with a propagator $\mathbold{Φ}$ derived from first-order linearization.

4.1 Equations of Motion

Let $h(t)$ be the height of the falling object, and let the state vector $\mathbold{x}(t)$ contain $h(t)$ and its first derivative, $\dot{h}(t)$, the speed of descent.

Omitting, for clarity’s sake, explicit dependence of $h$ and $\dot{h}$ on time, the equations of motion are elementary:

\noindent where

$g$ is the acceleration of Earth’s gravitation, about $32.2\,\textrm{ft}/{\textrm{s}}^2$
$ρ(h)$ is the density of air in $\textrm{slug}/{\textrm{ft}}^3$; $ρ\,{{\dot{h}}^2}$ has units of pressure, that is, $\textrm{slug}/(\textrm{ft}⋅{\textrm{sec}^2})$
$β = 500\,\textrm{slug}/(\textrm{ft}⋅{\textrm{sec}^2})$ is a constant ballistic coefficient of the object in units of pressure (it is possible to estimate this coefficient in the filter; here, we treat it as a known constant).

The positive direction is up and we are only concerned with negative velocities where the object is approaching the ground. We may provisionally replace the factor $\textrm{sign}({\dot{h}})$ with -1 and keep our eye out for improper positive speeds.

In scalar form, the equations are

\noindent or

\noindent with $k=22,000\,\left[\textrm{ft}\right]$, the e-folding height of the atmosphere, and $A=0.0034\,[\textrm{slug}/{{\textrm{ft}}^3}]$ for the density of air^zerr at $h=0$.

4.2 Stream Solver

We can write the same differential equation as a lazy stream, which uses only constant memory. Thus, it is suitable for the internals of a Kalman filter. We implement the integrator as an accumulator function for foldStream from paper 4 of this series,^klf4 which produces all intermediate results as a new stream.

The simplest integrator is the Euler integrator, which updates a state with its derivative times a small interval of time:

Like all accumulator functions, this is a binary function that takes two arguments. The first is an instance of accumulation type, in this case, a pair of a scalar time t and a vector state x. The second is an element of the input stream, a triple of a time differential dt, the same time t that appears in the first argument, and a function Dx that computes the derivative of the state given the state and the time as Dx[x,t].

Folding this integrator over the streamed differential equation produces a streamed solution. The input stream must produce values of the form {dt, t, Dx} and, like all streams, also contain a thunk that produces the rest of the stream.^thnk

This dragDStream contains nothing specific to our example, but just increments time and passes through the integration inputs. It could be much more rich, manipulating dt and Dx for speed or numerics (adaptive integration).

The kernel of our differential equation is the derivative function Dx, which, for our example, is the following:

\noindent in which x stands for $h$ and v for $\dot{h}$. It is generalized to handle differential equations that have explicit dependence on a time variable t, but that parameter is harmlessly ignored in this example. Integrating the differential equation for thirty seconds looks like this:

The type of the result, here, is a lazy stream produced by takeUntil from the lazy stream produced by foldStream. Because these streams are lazy, nothing happens until we demand values for, say, plotting, as in figure fig:ndsolve-falling-with-drag-results. These results are qualitatively indistinguishable from those in the reference and those produced by Wolfram’s built-in numerical integrator NDSolve, giving us high confidence that we’ve got it right.

The arguments of takeUntil are a stream and a predicate, in our case, the literal function First[#] > t1 &. The result is a new stream that pulls values from the original stream, applying the predicate until it produces True.

The implementations of foldStream, takeUntil and other stream operators is the subject of another paper in this series.

Given a stream containing a value v and a tail thunk, return the empty stream if the predicate evaluates to True:

Otherwise, recurse by invoking the thunk in the stream:

4.3 What’s the Point?

The point of this style of integration is that we can change three aspects of the integration independently of one another, leaving the others verbatim, without even recompilation, because we have un-nested and /decomplected/^hick these aspects:

the integrator
potential manipulation of the time increment dt and derivative function Dx
the internals of the derivative function Dx

For example, should Euler integration prove inadequate, and it does, we can easily substitute second- or fourth-order Runge-Kutta integrators. This turns out to be crucial for a high-performance EKF in this example. The only requirement on an integrator is that it must match the function signature or type:

Decomplecting these bits also makes them easier to review and verify by hand because dependencies are lexically obvious, easier to memorize and to find on a page.

4.4 Gain and Covariance Updates

For gains and covariances, first-order linear approximations suffice. If we write the non-linear equations in state-space form as $\mathbold{\dot{x}}=f(\mathbold{x})$, then a Taylor series, to first order, yields

\noindent where $\mathbold{F}$ is the Jacobian matrix,

\noindent and clearly fills the role played by $\mathbold{F}$ in the linear state-space form, equation \ref{eqn:state-space-form}. Our linearized system-dynamics matrix is

\noindent The matrix of partial derivatives is the Jacobian, the best linear approximation to at any time, and fills the ro

\noindent and reason that the matrix of partial derivatives will advance the state

We need $\mathbold{Φ}=e^\mathbold{Ft}$ to propagate solutions forward, because, if $\mathbold{\dot{x}}=\mathbold{F}\,\mathbold{x}$, then $e^\mathbold{Ft}\,\mathbold{x}$(t) effects a Taylor series. Again, to first order,

\noindent We take $\mathbold{Φ}(δ{t})=\mathbold{1}+\mathbold{F}\,δ{t}$ for our propagator matrix and compute the gains and covariances as in equations \ref{eqn:covariance-propagation-equation}, \ref{eqn:kalman-gain-definition}, and \ref{eqn:kalman-denominator-definition}:

\noindent where $Ξ$, integral of the process noise, is

5 The EKF

Though the following code block is bigger than we have seen in earlier papers in this series, it is a straight implementation of the notions explained above, highly modularized. The block of code establishes one global symbol, EKFDrag, which we tune and analyze in the last section of this paper.

With establishes numerical constants for the equations of motion. Module establishes local variables to hold the differential-equation kernel and stream, and for the propagator matrix $\mathbold{Φ}$ and process noise $\mathbold{Ξ}$.

The following lines furnish implementations of these functions:

The EKF itself is in the scope of these variables, and lambda lifts over

$σ_ξ$, the constant standard deviation of the process noise
$\mathbold{Z}$, the constant observation-noise matrix
the integrator function, for instance eulerAccumulator or either of the Runge-Kutta integrators
the filter period fdt
the internal integration period idt

allowing independent tuning of all these parameters. Its accumulation type is $\{\mathbold{x},\mathbold{P}\}$, as usual. Its observation type includes time $t$ because the integrators are all generalized to include it, even though, in our current example, the differential equations do not depend explicitly on the time variable. It could be optimized out. The other members of the observation packet are the usual partials matrix $\mathbold{A}$ and the observation itself $\mathbold{z}$. This is standard Kalman folding as explained in the first paper in this series.^klf1

The stream operator last forces the lazy integration stream to execute, albeit in constant memory, until it picks up and returns the final value produced by takeUntil. This composition of last, takeUntil, and foldStream performs the EKF’s high-precision replacement for the standard Kalman filter’s update equation \ref{eqn:state-propagation-equation}, operating at the integration period idt. The rest of the code implements equations \ref{eqn:covariance-propagation-equation}, \ref{eqn:kalman-gain-definition}, and \ref{eqn:kalman-denominator-definition} with the linearized propagator Phi operating at the filter period fdt.

6 Tuning and Performance

Because we can tune five parameters of the filter independently, we can efficiently explore the tuning space. The first task is to reproduce the author’s results, then to look for opportunities to improve them.

Zarchan and Musoff report filter convergence and excellent speed for the Euler integrator operating at a period of $0.1$ seconds, exactly the same as the filter period, and a standard deviation of $1,000\,\textrm{ft}$ for observation noise. We reproduce their results qualitatively in figure fig:euler-idt-point-1-zeta-1000, by folding EKFDrag over a lazy stream of deterministically pseudorandom observations. The smoother lines represent one-sigma theoretical covariance envelopes and the noisy dots represent five iterations of the filter over faked data.

Figure fig:euler-idt-point-1-zeta-1000 exhibits overall convergence, but there are signs of trouble for times beyond $20$ sec. These are visually obvious, but would be difficult to detect statistically.

The authors report, and we concur, complete filter failure when the observation standard deviation is reduced to $25$ feet, which forces the filter to rely much more on the integrator than on the observations at higher times because it has been told that the observations are reliable (low sigma). This interpretation is confirmed by the squeezing of the covariance envelopes in figure fig:euler-idt-point-1-zeta-25. The filter slavishly follows the integrator and seems to accumulate its floating-point errors into bad estimates. A detailed numerical analysis of this phenomenon is beyond the scope of this paper, but the authors gain evidence that this is the case, and we concur, by moving to a second-order Runge-Kutta integrator. They find, and we concur, that they must move to an integration period of $0.001$ seconds, $100$ times slower, to regain convergence. See figure fig:rk2-idt-point-001-zeta-25.

We were able to restore the speed of the filter and produce results visually indistinguishable from figure fig:rk2-idt-point-001-zeta-25 with the fourth-order Runge-Kutta integrator simply by feeding those parameters into EKFDrag. Now suitably tuned, the filter can be deployed verbatim, without even recompilation, in the field. We emphasize the importance of verbatim deployment, as in the first paper in this series, because floating-point issues are extremely sensitive to the slightest change. We have seen many cases where even changing the order of operations by compiler optimizer flags or by targeting different versions of the same processor family produce qualitatively different results due to non-associativity of floating point math and accumulation phenomena.

We note in passing that Zarchan and Musoff also find, and we concur, that increasing the order of the Taylor series for computing $\mathbold{Φ}$ and $\mathbold{Ξ}$ does not qualitatively improve the filter. That option might become relevant and important at longer filter periods fdt or in other applications.

7 Concluding Remarks

The Extended Kalman Filter (EKF) typically handles non-linear problems by propagating states with high-precision integration and propagating Kalman gain and state covariance by linear approximations. The benefits of writing EKFs as folds over lazy streams include high modularity, allowing efficient and accurate tuning and diagnosis, and the flexibility to deploy fragile floating-point code verbatim, without even recompilation, from the lab to the field.

^affn https://en.wikipedia.org/wiki/Affine_transformation ^bars Bar-Shalom, Yaakov, et al. Estimation with applications to tracking and navigation. New York: Wiley, 2001. ^bier http://tinyurl.com/h3jh4kt ^bssl https://en.wikipedia.org/wiki/Bessel’s_correction ^busi https://en.wikipedia.org/wiki/Business_logic ^cdot We sometimes use the center dot or the $×$ symbols to clarify matrix multiplication. They have no other significance and we can always write matrix multiplication just by juxtaposing the matrices. ^clos https://en.wikipedia.org/wiki/Closure_(computer_programming) ^cold This convention only models so-called cold observables, but it’s enough to demonstrate Kalman’s working over them. ^cons This is quite similar to the standard — not Wolfram’s — definition of a list as a pair of a value and of another list. ^cova We use the terms covariance for matrices and variance for scalars. ^csoc https://en.wikipedia.org/wiki/Separation_of_concerns ^ctsc https://en.wikipedia.org/wiki/Catastrophic_cancellation ^dstr http://tinyurl.com/ze6qfb3 ^elib Brookner, Eli. Tracking and Kalman Filtering Made Easy, New York: Wiley, 1998. http://tinyurl.com/h8see8k ^fldl http://tinyurl.com/jmxsevr ^fwik https://en.wikipedia.org/wiki/Fold_%28higher-order_function%29 ^gama https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem ^hick “Decomplecting” is a term coined by Rich Hickey for un-braiding and un-nesting bits of software. ^intr http://introtorx.com/ ^jplg JPL Geodynamics Program http://www.jpl.nasa.gov/report/1981.pdf ^just justified by the fact that $\mathbold{D}$ is a diagonal matrix that commutes with all other products, therefore its left and right inverses are equal and can be written as a reciprocal; in fact, $\mathbold{D}$ is a $1×{1}$ matrix — effectively a scalar — in all examples in this paper ^klde B. Beckman, Kalman Folding 3: Derivations, http://vixra.org/abs/1607.0059. ^klf1 B. Beckman, Kalman Folding, Part 1, http://vixra.org/abs/1606.0328. ^klf2 B. Beckman, Kalman Folding 2: Tracking and System Dynamics, http://vixra.org/abs/1606.0348. ^klf3 B. Beckman, Kalman Folding 3: Derivations, http://vixra.org/abs/1607.0059. ^klf4 B. Beckman, Kalman Folding 4: Streams and Observables, to appear. ^klfl B. Beckman, Kalman Folding, Part 1, http://vixra.org/abs/1606.0328. ^klfl B. Beckman, Kalman Folding, Part 1, to appear. ^layi https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering ^lmbd Many languages use the keyword lambda for such expressions; Wolfram uses the name Function. ^lmlf https://en.wikipedia.org/wiki/Lambda_lifting ^lols Let Over Lambda ^lssq https://en.wikipedia.org/wiki/Least_squares ^ltis http://tinyurl.com/hhhcgca ^matt https://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf ^mcmc https://en.wikipedia.org/wiki/Particle_filter ^musc http://www1.cs.dartmouth.edu/~doug/music.ps.gz ^ndim https://en.wikipedia.org/wiki/Nondimensionalization ^patt http://tinyurl.com/j5jzy69 ^pseu http://tinyurl.com/j8gvlug ^rasp http://www.wolfram.com/raspberry-pi/ ^rcrn https://en.wikipedia.org/wiki/Recurrence_relation ^rsfr http://rosettacode.org/wiki/Loops/Foreach ^rxbk http://www.introtorx.com/content/v1.0.10621.0/07_Aggregation.html ^scan and of Haskell’s scans and folds, and Rx’s scans and folds, etc. ^scla http://tinyurl.com/hhdot36 ^scnd A state-space form containing a position and derivative is commonplace in second-order dynamics like Newton’s Second Law. We usually employ state-space form to reduce $n$-th-order differential equations to first-order differential equations by stacking the dependent variable on $n-1$ of its derivatives in the state vector. ^scnl http://learnyouahaskell.com/higher-order-functions ^stsp https://en.wikipedia.org/wiki/State-space_representation ^thnk Wolfram’s ampersand postfix operator can covert its operand into a thunk. ^time In most applications, the independent variable is physical time, however, it need not be. For convenience, we use the term time to mean the independent variable of the problem simply because it is shorter to write. ^uncl The initial uncial (lower-case) letter signifies that we wrote this function; it wasn’t supplied by Wolfram. ^wfld http://reference.wolfram.com/language/ref/FoldList.html?q=FoldList ^wlf1 http://tinyurl.com/nfz9fyo ^wlf2 http://rebcabin.github.io/blog/2013/02/04/welfords-better-formula/ ^wolf http://reference.wolfram.com/language/ ^zarc Zarchan and Musoff, Fundamentals of Kalman Filtering, A Practical Approach, Fourth Edition, Ch. 4 ^zerr Zarchan and Musoff, on page 228, report $0.0034$ for the density of air in $\textrm{slug}/\textrm{ft}^3$ at the surface; we believe the correct value is about $0.00242$ but continue with $0.0034$ for comparison’s sake.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kalman-folding-05-008-for-non-linear-problems.org

kalman-folding-05-008-for-non-linear-problems.org

Kalman Folding 5: Non-Linear Models and the EKF (Review Draft)

1 Abstract

2 Background and Synopsis

3 Linear Kalman Accumulator with Time Evolution

3.1 Dimensional Arguments

4 Tracking with Drag

4.1 Equations of Motion

4.2 Stream Solver

4.3 What’s the Point?

4.4 Gain and Covariance Updates

5 The EKF

6 Tuning and Performance

7 Concluding Remarks

Files

kalman-folding-05-008-for-non-linear-problems.org

Latest commit

History

kalman-folding-05-008-for-non-linear-problems.org

File metadata and controls

Kalman Folding 5: Non-Linear Models and the EKF (Review Draft)

1 Abstract

2 Background and Synopsis

3 Linear Kalman Accumulator with Time Evolution

3.1 Dimensional Arguments

4 Tracking with Drag

4.1 Equations of Motion

4.2 Stream Solver

4.3 What’s the Point?

4.4 Gain and Covariance Updates

5 The EKF

6 Tuning and Performance

7 Concluding Remarks