diff --git a/02-Discrete-Bayes.ipynb b/02-Discrete-Bayes.ipynb
index b107d322..ea369d1b 100644
--- a/02-Discrete-Bayes.ipynb
+++ b/02-Discrete-Bayes.ipynb
@@ -1508,7 +1508,7 @@
     "        # move the robot and\n",
     "        robot.move(distance=move_distance)\n",
     "\n",
-    "        # peform prediction\n",
+    "        # perform prediction\n",
     "        prior = predict(posterior, move_distance, kernel)       \n",
     "\n",
     "        #  and update the filter\n",
@@ -1720,7 +1720,7 @@
    "source": [
     "## References\n",
     "\n",
-    " * [1] D. Fox, W. Burgard, and S. Thrun. \"Monte carlo localization: Efficient position estimation for mobile robots.\" In *Journal of Artifical Intelligence Research*, 1999.\n",
+    " * [1] D. Fox, W. Burgard, and S. Thrun. \"Monte carlo localization: Efficient position estimation for mobile robots.\" In *Journal of Artificial Intelligence Research*, 1999.\n",
     " \n",
     " http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume11/fox99a-html/jair-localize.html\n",
     "\n",
@@ -1735,7 +1735,7 @@
     " https://www.udacity.com/course/cs373\n",
     " \n",
     " \n",
-    " * [4] Khan Acadamy. \"Introduction to the Convolution\"\n",
+    " * [4] Khan Academy. \"Introduction to the Convolution\"\n",
     " \n",
     " https://www.khanacademy.org/math/differential-equations/laplace-transform/convolution-integral/v/introduction-to-the-convolution\n",
     " \n",
diff --git a/03-Gaussians.ipynb b/03-Gaussians.ipynb
index 47b761ae..a5e1fe55 100644
--- a/03-Gaussians.ipynb
+++ b/03-Gaussians.ipynb
@@ -1288,7 +1288,7 @@
     "\n",
     "The discrete Bayes filter works by multiplying and adding arbitrary probability random variables. The Kalman filter uses Gaussians instead of arbitrary random variables, but the rest of the algorithm remains the same. This means we will need to multiply and add Gaussian random variables (Gaussian random variable is just another way to say normally distributed random variable). \n",
     "\n",
-    "A remarkable property of Gaussian random variables is that the sum of two independent Gaussian random variables is also normally distributed! The product is not Gaussian, but proportional to a Gaussian. There we can say that the result of multipying two Gaussian distributions is a Gaussian function (recall function in this context means that the property that the values sum to one is not guaranteed).\n",
+    "A remarkable property of Gaussian random variables is that the sum of two independent Gaussian random variables is also normally distributed! The product is not Gaussian, but proportional to a Gaussian. There we can say that the result of multiplying two Gaussian distributions is a Gaussian function (recall function in this context means that the property that the values sum to one is not guaranteed).\n",
     "\n",
     "Wikipedia has a good article on this property, and I also prove it at the end of this chapter. \n",
     "https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables\n",
@@ -1951,7 +1951,7 @@
     "\n",
     "$$p(x \\mid z) \\propto p(z|x)p(x)$$\n",
     "\n",
-    "Now we subtitute in the equations for the Gaussians, which are\n",
+    "Now we substitute in the equations for the Gaussians, which are\n",
     "\n",
     "$$p(z \\mid x) = \\frac{1}{\\sqrt{2\\pi\\sigma_z^2}}\\exp \\Big[-\\frac{(z-x)^2}{2\\sigma_z^2}\\Big]$$\n",
     "\n",
diff --git a/04-One-Dimensional-Kalman-Filters.ipynb b/04-One-Dimensional-Kalman-Filters.ipynb
index 3acf016e..76f3cc14 100644
--- a/04-One-Dimensional-Kalman-Filters.ipynb
+++ b/04-One-Dimensional-Kalman-Filters.ipynb
@@ -383,7 +383,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's test it. What is the prior if the intitial position is the Gaussian $\\mathcal N(10, 0.2^2)$ and the movement is the Gaussian $\\mathcal N (15, 0.7^2)$?"
+    "Let's test it. What is the prior if the initial position is the Gaussian $\\mathcal N(10, 0.2^2)$ and the movement is the Gaussian $\\mathcal N (15, 0.7^2)$?"
    ]
   },
   {
@@ -440,7 +440,7 @@
     "\n",
     "Both the likelihood and prior are modeled with Gaussians. Can we multiply Gaussians? Is the product of two Gaussians another Gaussian?\n",
     "\n",
-    "Yes to the former, and almost to the latter! In the last chapter I proved that the product of two Gaussians is proportional to another Gausian. \n",
+    "Yes to the former, and almost to the latter! In the last chapter I proved that the product of two Gaussians is proportional to another Gaussian. \n",
     "\n",
     "$$\\begin{aligned}\n",
     "\\mu &= \\frac{\\sigma_1^2 \\mu_2 + \\sigma_2^2 \\mu_1} {\\sigma_1^2 + \\sigma_2^2}, \\\\\n",
diff --git a/05-Multivariate-Gaussians.ipynb b/05-Multivariate-Gaussians.ipynb
index 5377c27c..e01ddbd7 100644
--- a/05-Multivariate-Gaussians.ipynb
+++ b/05-Multivariate-Gaussians.ipynb
@@ -745,7 +745,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "These plots look like circles and ellipses. Indeed, it turns out that any slice through the multivariate Gaussian is an ellipse. Hence, in statistics we do not call these 'contour plots', but either *error ellipses* or *confidence ellipses*; the terms are interchangable.\n",
+    "These plots look like circles and ellipses. Indeed, it turns out that any slice through the multivariate Gaussian is an ellipse. Hence, in statistics we do not call these 'contour plots', but either *error ellipses* or *confidence ellipses*; the terms are interchangeable.\n",
     "\n",
     "This code uses the function `plot_covariance_ellipse()` from `filterpy.stats`. By default the function displays one standard deviation, but you can use either the `variance` or `std` parameter to control what is displayed. For example, `variance=3**2` or `std=3` would display the 3rd standard deviation, and `variance=[1,4,9]` or `std=[1,2,3]` would display the 1st, 2nd, and 3rd standard deviations. "
    ]
@@ -1773,7 +1773,7 @@
     "\n",
     "It is important to understand that we are taking advantage of the fact that velocity and position are correlated. We get a rough estimate of velocity from the distance and time between two measurements, and use Bayes theorem to produce very accurate estimates after only a few observations. Please reread this section if you have any doubts. If you do not understand this you will quickly find it impossible to reason about what you will learn in the following chapters.\n",
     "\n",
-    "The effect of including velocity appears to me minor if only care about the position. But this is only after one update. In the next chapter we will see what a dramatic increase in certainty we have after multiple updates. The measurment variance will be large, but the estimated position variance will be small. Each time you intersect the velocity covariance with position it gets narrower on the x-axis, hence the variance is also smaller each time."
+    "The effect of including velocity appears to me minor if only care about the position. But this is only after one update. In the next chapter we will see what a dramatic increase in certainty we have after multiple updates. The measurement variance will be large, but the estimated position variance will be small. Each time you intersect the velocity covariance with position it gets narrower on the x-axis, hence the variance is also smaller each time."
    ]
   },
   {
diff --git a/06-Multivariate-Kalman-Filters.ipynb b/06-Multivariate-Kalman-Filters.ipynb
index 47a40945..7d01f395 100644
--- a/06-Multivariate-Kalman-Filters.ipynb
+++ b/06-Multivariate-Kalman-Filters.ipynb
@@ -1611,7 +1611,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "There are many attributes there that we haven't discussed yet, but many should be familar.\n",
+    "There are many attributes there that we haven't discussed yet, but many should be familiar.\n",
     "\n",
     "At this point you could write code to plot any of these variables. However, it is often more useful to use `np.array` instead of lists. Calling `Saver.to_array()` will convert the lists into `np.array`. There is one caveat: if the shape of any of the attributes changes during the run, the `to_array` will raise an exception since `np.array` requires all of the elements to be of the same type and size. \n",
     "\n",
@@ -1724,7 +1724,7 @@
     "\n",
     "$$\\bar\\sigma^2 = \\sigma^2 + \\sigma^2_{move}$$\n",
     "\n",
-    "We add the variance of the movement to the variance of our estimate to reflect the loss of knowlege. We need to do the same thing here, except it isn't quite that easy with multivariate Gaussians. \n",
+    "We add the variance of the movement to the variance of our estimate to reflect the loss of knowledge. We need to do the same thing here, except it isn't quite that easy with multivariate Gaussians. \n",
     "\n",
     "We can't simply write $\\mathbf{\\bar P} = \\mathbf P + \\mathbf Q$. In a multivariate Gaussians the state variables are *correlated*. What does this imply? Our knowledge of the velocity is imperfect, but we are adding it to the position with\n",
     "\n",
@@ -1809,7 +1809,7 @@
    "source": [
     "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information about the position due to adding $\\dot x\\Delta t$ to x at each step. The height has not changed - our system model says the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf F$ linearly correlates $x$ with $\\dot x$ with the expression $\\bar x = \\dot x \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf T$ computation correctly incorporates this correlation into the covariance matrix.\n",
     "\n",
-    "Here is an animation of this equation that allows you to change the design of $\\mathbf F$ to see how it affects shape of $\\mathbf P$. The `F00` slider affects the value of F[0, 0]. `covar` sets the intial covariance between the position and velocity($\\sigma_x\\sigma_{\\dot x}$). I recommend answering these questions at a minimum\n",
+    "Here is an animation of this equation that allows you to change the design of $\\mathbf F$ to see how it affects shape of $\\mathbf P$. The `F00` slider affects the value of F[0, 0]. `covar` sets the initial covariance between the position and velocity($\\sigma_x\\sigma_{\\dot x}$). I recommend answering these questions at a minimum\n",
     "\n",
     "* what if $x$ is not correlated to $\\dot x$? (set F01 to 0, the rest at defaults)\n",
     "* what if $x = 2\\dot x\\Delta t + x_0$? (set F01 to 2, the rest at defaults)\n",
@@ -2550,7 +2550,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Looking at the output we see a very large spike in the filter output at the beginning. We set $\\text{P}=500\\, \\mathbf{I}_2$ (this is shorthand notation for a 2x2 diagonal matrix with 500 in the diagonal). We now have enough information to understand what this means, and how the Kalman filter treats it. The 500 in the upper left hand corner corresponds to $\\sigma^2_x$; therefore we are saying the standard deviation of `x` is $\\sqrt{500}$, or roughly 22.36 m. Roughly 99% of the samples occur withing $3\\sigma$, therefore $\\sigma^2_x=500$ is telling the Kalman filter that the prediction (the prior) could be up to 67 meters off. That is a large error, so when the measurement spikes the Kalman filter distrusts its own estimate and jumps wildly to try to incorporate the measurement. Then, as the filter evolves $\\mathbf P$ quickly converges to a more realistic value.\n",
+    "Looking at the output we see a very large spike in the filter output at the beginning. We set $\\text{P}=500\\, \\mathbf{I}_2$ (this is shorthand notation for a 2x2 diagonal matrix with 500 in the diagonal). We now have enough information to understand what this means, and how the Kalman filter treats it. The 500 in the upper left hand corner corresponds to $\\sigma^2_x$; therefore we are saying the standard deviation of `x` is $\\sqrt{500}$, or roughly 22.36 m. Roughly 99% of the samples occur within $3\\sigma$, therefore $\\sigma^2_x=500$ is telling the Kalman filter that the prediction (the prior) could be up to 67 meters off. That is a large error, so when the measurement spikes the Kalman filter distrusts its own estimate and jumps wildly to try to incorporate the measurement. Then, as the filter evolves $\\mathbf P$ quickly converges to a more realistic value.\n",
     "\n",
     "Let's look at the math behind this. The equation for the Kalman gain is\n",
     "\n",
@@ -2847,7 +2847,7 @@
    "source": [
     "## Batch Processing\n",
     "\n",
-    "The Kalman filter is designed as a recursive algorithm - as new measurements come in we immediately create a new estimate. But it is very common to have a set of data that have been already collected which we want to filter. Kalman filters can be run in a batch mode, where all of the measurements are filtered at once. We have implemented this in `KalmanFilter.batch_filter()`. Internally, all the function does is loop over the measurements and collect the resulting state and covariance estimates in arrays. It simplifies your logic and conveniently gathers all of the outputs into arrays. I often use this function, but waited until the end of the chapter so you would become very familiar with the predict/update cyle that you must run.\n",
+    "The Kalman filter is designed as a recursive algorithm - as new measurements come in we immediately create a new estimate. But it is very common to have a set of data that have been already collected which we want to filter. Kalman filters can be run in a batch mode, where all of the measurements are filtered at once. We have implemented this in `KalmanFilter.batch_filter()`. Internally, all the function does is loop over the measurements and collect the resulting state and covariance estimates in arrays. It simplifies your logic and conveniently gathers all of the outputs into arrays. I often use this function, but waited until the end of the chapter so you would become very familiar with the predict/update cycle that you must run.\n",
     "\n",
     "First collect your measurements into an array or list. Maybe it is in a CSV file:\n",
     "\n",
diff --git a/07-Kalman-Filter-Math.ipynb b/07-Kalman-Filter-Math.ipynb
index 617f84ad..ea4a5445 100644
--- a/07-Kalman-Filter-Math.ipynb
+++ b/07-Kalman-Filter-Math.ipynb
@@ -1548,7 +1548,7 @@
     "\n",
     "$$\\mathbf z_k \\sim P(\\mathbf z_k \\mid \\mathbf x_k)$$\n",
     "\n",
-    "We have a recurrence now, so we need an initial condition to terminate it. Therefore we say that the initial distribution is the probablity of the state $\\mathbf x_0$:\n",
+    "We have a recurrence now, so we need an initial condition to terminate it. Therefore we say that the initial distribution is the probability of the state $\\mathbf x_0$:\n",
     "\n",
     "$$\\mathbf x_0 \\sim P(\\mathbf x_0)$$\n",
     "\n",
diff --git a/08-Designing-Kalman-Filters.ipynb b/08-Designing-Kalman-Filters.ipynb
index ba24de09..275126ec 100644
--- a/08-Designing-Kalman-Filters.ipynb
+++ b/08-Designing-Kalman-Filters.ipynb
@@ -2040,7 +2040,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As we will see the mahalanobis distance computes the scalar standard deviation *distance* point to a distribution, much like the Euclidian distance computes a scalar distance from a point to another point. \n",
+    "As we will see the mahalanobis distance computes the scalar standard deviation *distance* point to a distribution, much like the Euclidean distance computes a scalar distance from a point to another point. \n",
     "\n",
     "The cell above bears that out. The point that was sitting on the 3 std boundary has a mahalanobis distance of 3.0, and the one outside of the ellipse has a value of 3.6 std.\n",
     "\n",
@@ -2083,7 +2083,7 @@
     "\n",
     "Given this list of billions of tracks we can then compute a score for each track. I'll provide the math for that in the following section. But, visualize a track that forms a 'Z' shape over 3 epochs. No aircraft can maneuver like that, so we would give it a very low probability of being real. Another track forms a straight line, but imputes a velocity of 10,000 kph. That is also very improbable. Another track curves at 200 kph. That has a high probability.\n",
     "\n",
-    "So tracking becomes a matter of gating, data association, and pruning. For example, say the second radar sweep just occured. Do I combine all possible combinations into a tracks? I probably shouldn't. If point 1, sweep 1 imputes a velocity of 200kph with point 3, sweep 2 we should form a track from it. If the velocity is 5000 kph we shouldn't bother; we know that track is so unlikely as to be impossible. Then, as the tracks grow we will have well defined ellipsoidal or maneuver gates for them, and we can be far more selective about the measurements we associate with tracks.\n",
+    "So tracking becomes a matter of gating, data association, and pruning. For example, say the second radar sweep just occurred. Do I combine all possible combinations into a tracks? I probably shouldn't. If point 1, sweep 1 imputes a velocity of 200kph with point 3, sweep 2 we should form a track from it. If the velocity is 5000 kph we shouldn't bother; we know that track is so unlikely as to be impossible. Then, as the tracks grow we will have well defined ellipsoidal or maneuver gates for them, and we can be far more selective about the measurements we associate with tracks.\n",
     "\n",
     "There are schemes for associations. We can choose to associate a measurement to only one track. Or, we can choose to associate a measurement with multiple tracks, reflecting our lack of certainty with which track it belongs to. For example, aircraft tracks can cross from the point of view of the radar. As the aircraft approach associating a single measurement with one of the two aircraft can become uncertain. You could assign the measurement to both tracks for a short time. As you gather more measurements you could then go back and change assignment based on which is more probable given the new information.\n",
     "\n",
@@ -3580,7 +3580,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Our first step is to implement the math for a ball moving through air. There are several treatments available. A robust solution takes into account issues such as ball roughness (which affects drag non-linearly depending on velocity), the Magnus effect (spin causes one side of the ball to have higher velocity relative to the air vs the opposite side, so the coefficient of drag differs on opposite sides), the effect of lift, humidity, air density, and so on. I assume the reader is not interested in the details of ball physics, and so will restrict this treatment to the effect of air drag on a non-spinning baseball. I will use the math developed by Nicholas Giordano and Hisao Nakanishi in *Computational Physics*  [1997]. This treatement does not take all the factors into account. The most detailed treatment is by Alan Nathan on his website at http://baseball.physics.illinois.edu/index.html. I use his math in my own work in computer vision, but I do not want to get distracted by a more complicated model.\n",
+    "Our first step is to implement the math for a ball moving through air. There are several treatments available. A robust solution takes into account issues such as ball roughness (which affects drag non-linearly depending on velocity), the Magnus effect (spin causes one side of the ball to have higher velocity relative to the air vs the opposite side, so the coefficient of drag differs on opposite sides), the effect of lift, humidity, air density, and so on. I assume the reader is not interested in the details of ball physics, and so will restrict this treatment to the effect of air drag on a non-spinning baseball. I will use the math developed by Nicholas Giordano and Hisao Nakanishi in *Computational Physics*  [1997]. This treatment does not take all the factors into account. The most detailed treatment is by Alan Nathan on his website at http://baseball.physics.illinois.edu/index.html. I use his math in my own work in computer vision, but I do not want to get distracted by a more complicated model.\n",
     "\n",
     "**Important**: Before I continue, let me point out that you will not have to understand this next piece of physics to proceed with the Kalman filter. My goal is to create a reasonably accurate behavior of a baseball in the real world, so that we can test how our Kalman filter performs with real-world behavior. In real world applications it is usually impossible to completely model the physics of a real world system, and we make do with a process model that incorporates the large scale behaviors. We then tune the measurement noise and process noise until the filter works well with our data. There is a real risk to this; it is easy to finely tune a Kalman filter so it works perfectly with your test data, but performs badly when presented with slightly different data. This is perhaps the hardest part of designing a Kalman filter, and why it gets referred to with terms such as 'black art'. \n",
     "\n",
@@ -3716,7 +3716,7 @@
     "        x0,y0            initial position\n",
     "        launch_angle_deg angle ball is travelling respective to \n",
     "                         ground plane\n",
-    "        velocity_ms      speeed of ball in meters/second\n",
+    "        velocity_ms      speed of ball in meters/second\n",
     "        noise            amount of noise to add to each position\n",
     "                         in (x, y)\n",
     "        \"\"\"\n",
diff --git a/09-Nonlinear-Filtering.ipynb b/09-Nonlinear-Filtering.ipynb
index 37b67ffc..506271bc 100644
--- a/09-Nonlinear-Filtering.ipynb
+++ b/09-Nonlinear-Filtering.ipynb
@@ -663,7 +663,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The world is nonlinear, but we only really know how to solve linear problems. This introduces significant difficulties for Kalman filters. We've looked at how nonlinearity affects filtering in 3 different but equivalent ways, and I've given you a brief summary of the major appoaches: the linearized Kalman filter, the extended Kalman filter, the Unscented Kalman filter, and the particle filter. \n",
+    "The world is nonlinear, but we only really know how to solve linear problems. This introduces significant difficulties for Kalman filters. We've looked at how nonlinearity affects filtering in 3 different but equivalent ways, and I've given you a brief summary of the major approaches: the linearized Kalman filter, the extended Kalman filter, the Unscented Kalman filter, and the particle filter. \n",
     "\n",
     "Until recently the linearized Kalman filter and EKF have been the standard way to solve these problems. They are very difficult to understand and use, and they are also potentially very unstable. \n",
     "\n",
diff --git a/10-Unscented-Kalman-Filter.ipynb b/10-Unscented-Kalman-Filter.ipynb
index 1ff455d3..73d64658 100644
--- a/10-Unscented-Kalman-Filter.ipynb
+++ b/10-Unscented-Kalman-Filter.ipynb
@@ -314,7 +314,7 @@
     "\n",
     "Okay, now let's implement the filter. We will implement a standard linear filter in 1D; we aren't quite ready to tackle a nonlinear filter yet. The design of the filter is not much different than what we have learned so far, with one difference. The KalmanFilter class uses the matrix $\\mathbf F$ to compute the state transition function. Matrices mean **linear** algrebra, which work for linear problems, but not nonlinear ones. So, instead of a matrix we provide a function, just like we did above. The KalmanFilter class uses another matrix $\\mathbf H$ to implement the measurement function, which converts a state into the equivalent measurement. Again, a matrix implies linearity, so instead of a matrix we provide a function. Perhaps it is clear why $\\mathbf H$ is called the 'measurement function'; for the linear Kalman filter it is a matrix, but that is just a fast way to compute a function that happens to be linear. \n",
     "\n",
-    "Without further ado, here are the state transistion function and measurement function for a 1D tracking problem, where the state is $\\mathbf x = [x \\, \\, \\dot x]^ \\mathsf T$:"
+    "Without further ado, here are the state transition function and measurement function for a 1D tracking problem, where the state is $\\mathbf x = [x \\, \\, \\dot x]^ \\mathsf T$:"
    ]
   },
   {
@@ -534,7 +534,7 @@
     "\n",
     "These equations should be familiar - they are the constraint equations we developed above. \n",
     "\n",
-    "In short, the unscented transform takes points sampled from some arbitary probability distribution, passes them through an arbitrary, nonlinear function and produces a Gaussian for each transformed points. I hope you can envision how we can use this to implement a nonlinear Kalman filter. Once we have Gaussians all of the mathematical apparatus we have already developed comes into play!\n",
+    "In short, the unscented transform takes points sampled from some arbitrary probability distribution, passes them through an arbitrary, nonlinear function and produces a Gaussian for each transformed points. I hope you can envision how we can use this to implement a nonlinear Kalman filter. Once we have Gaussians all of the mathematical apparatus we have already developed comes into play!\n",
     "\n",
     "The name \"unscented\" might be confusing. It doesn't really mean much. It was a joke fostered by the inventor that his algorithm didn't \"stink\", and soon the name stuck. There is no mathematical meaning to the term."
    ]
@@ -607,7 +607,7 @@
    "source": [
     "I find this result remarkable. Using only 5 points we were able to compute the mean with amazing accuracy. The error in x is only -0.097, and the error in y is 0.549. In contrast, a linearized approach (used by the EKF, which we will learn in the next chapter) gave an error of over 43 in y. If you look at the code that generates the sigma points you'll see that it has no knowledge of the nonlinear function, only of the mean and covariance of our initial distribution. The same 5 sigma points would be generated if we had a completely different nonlinear function. \n",
     "\n",
-    "I will admit to choosing a nonlinear function that makes the performance of the unscented tranform striking compared to the EKF. But the physical world is filled with very nonlinear behavior, and the UKF takes it in stride. I did not 'work' to find a function where the unscented transform happened to work well. You will see in the next chapter how more traditional techniques struggle with strong nonlinearities. This graph is the foundation of why I advise you to use the UKF or similar modern technique whenever possible."
+    "I will admit to choosing a nonlinear function that makes the performance of the unscented transform striking compared to the EKF. But the physical world is filled with very nonlinear behavior, and the UKF takes it in stride. I did not 'work' to find a function where the unscented transform happened to work well. You will see in the next chapter how more traditional techniques struggle with strong nonlinearities. This graph is the foundation of why I advise you to use the UKF or similar modern technique whenever possible."
    ]
   },
   {
@@ -2948,7 +2948,7 @@
     "\n",
     "With that done we are now ready to implement the UKF. I want to point out that when I designed this filter I did not just design all of functions above in one sitting, from scratch. I put together a basic UKF with predefined landmarks, verified it worked, then started filling in the pieces. \"What if I see different landmarks?\" That lead me to change the measurement function to accept an array of landmarks. \"How do I deal with computing the residual of angles?\" This led me to write the angle normalization code. \"What is the *mean* of a set of angles?\" I searched on the internet, found an article on Wikipedia, and implemented that algorithm. Do not be daunted. Design what you can, then ask questions and solve them, one by one.\n",
     "\n",
-    "You've seen the UKF implemention already, so I will not describe it in detail. There are two new things here. When we construct the sigma points and filter we have to provide it the functions that we have written to compute the residuals and means.\n",
+    "You've seen the UKF implementation already, so I will not describe it in detail. There are two new things here. When we construct the sigma points and filter we have to provide it the functions that we have written to compute the residuals and means.\n",
     "\n",
     "```python\n",
     "points = SigmaPoints(n=3, alpha=.00001, beta=2, kappa=0, \n",
diff --git a/11-Extended-Kalman-Filters.ipynb b/11-Extended-Kalman-Filters.ipynb
index ac78a5ac..8cb3a553 100644
--- a/11-Extended-Kalman-Filters.ipynb
+++ b/11-Extended-Kalman-Filters.ipynb
@@ -242,7 +242,7 @@
     "0 & 1 & 0 \\\\ \\hline\n",
     "0 & 0 & 1\\end{array}\\right]$$\n",
     "\n",
-    "I've partioned the matrix into blocks to show the upper left block is a constant velocity model for $x$, and the lower right block is a constant position model for $y$.\n",
+    "I've partitioned the matrix into blocks to show the upper left block is a constant velocity model for $x$, and the lower right block is a constant position model for $y$.\n",
     "\n",
     "However, let's practice finding these matrices. We model systems with a set of differential equations. We need an equation in the form \n",
     "\n",
@@ -1489,7 +1489,7 @@
     "\n",
     "I said that this was a real problem, and in some ways it is. I've seen alternative presentations that used robot motion models that led to simpler Jacobians. On the other hand, my model of the movement is also simplistic in several ways. First, it uses a bicycle model. A real car has two sets of tires, and each travels on a different radius. The wheels do not grip the surface perfectly. I also assumed that the robot responds instantaneously to the control input. Sebastian Thrun writes in *Probabilistic Robots* that this simplified model is justified because the filters perform well when used to track real vehicles. The lesson here is that while you have to have a reasonably accurate nonlinear model, it does not need to be perfect to operate well. As a designer you will need to balance the fidelity of your model with the difficulty of the math and the CPU time required to perform the linear algebra. \n",
     "\n",
-    "Another way in which this problem was simplistic is that we assumed that we knew the correspondance between the landmarks and measurements. But suppose we are using radar - how would we know that a specific signal return corresponded to a specific building in the local scene? This question hints at SLAM algorithms - simultaneous localization and mapping. SLAM is not the point of this book, so I will not elaborate on this topic. "
+    "Another way in which this problem was simplistic is that we assumed that we knew the correspondence between the landmarks and measurements. But suppose we are using radar - how would we know that a specific signal return corresponded to a specific building in the local scene? This question hints at SLAM algorithms - simultaneous localization and mapping. SLAM is not the point of this book, so I will not elaborate on this topic. "
    ]
   },
   {
diff --git a/13-Smoothing.ipynb b/13-Smoothing.ipynb
index 2b988d97..dfc86a26 100644
--- a/13-Smoothing.ipynb
+++ b/13-Smoothing.ipynb
@@ -211,7 +211,7 @@
     "\n",
     "* Fixed-Point Smoothing\n",
     "\n",
-    "A fixed-point filter operates as a normal Kalman filter, but also produces an estimate for the state at some fixed time $j$.  Before the time $k$ reaches $j$ the filter operates as a normal filter. Once $k>j$ the filter estimates $x_k$ and then also updates its estimate for $x_j$ using all of the measurements between $j\\dots k$. This can be useful to estimate initial paramters for a system, or for producing the best estimate for an event that happened at a specific time. For example, you may have a robot that took a photograph at time $j$. You can use a fixed-point smoother to get the best possible pose information for the camera at time $j$ as the robot continues moving.\n",
+    "A fixed-point filter operates as a normal Kalman filter, but also produces an estimate for the state at some fixed time $j$.  Before the time $k$ reaches $j$ the filter operates as a normal filter. Once $k>j$ the filter estimates $x_k$ and then also updates its estimate for $x_j$ using all of the measurements between $j\\dots k$. This can be useful to estimate initial parameters for a system, or for producing the best estimate for an event that happened at a specific time. For example, you may have a robot that took a photograph at time $j$. You can use a fixed-point smoother to get the best possible pose information for the camera at time $j$ as the robot continues moving.\n",
     "\n",
     "## Choice of Filters\n",
     "\n",
@@ -484,9 +484,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "At step $k$ we can estimate $x_k$ using the normal Kalman filter equations. However, we can make a better estimate for $x_{k-1}$ by using the measurement received for $x_k$. Likewise, we can make a better estimate for $x_{k-2}$ by using the measurements recevied for $x_{k-1}$ and $x_{k}$. We can extend this computation back for an arbitrary $N$ steps.\n",
+    "At step $k$ we can estimate $x_k$ using the normal Kalman filter equations. However, we can make a better estimate for $x_{k-1}$ by using the measurement received for $x_k$. Likewise, we can make a better estimate for $x_{k-2}$ by using the measurements received for $x_{k-1}$ and $x_{k}$. We can extend this computation back for an arbitrary $N$ steps.\n",
     "\n",
-    "Derivation for this math is beyond the scope of this book; Dan Simon's *Optimal State Estimation* [2] has  a very good exposition if you are interested. The essense of the idea is that instead of having a state vector $\\mathbf{x}$ we make an augmented state containing\n",
+    "Derivation for this math is beyond the scope of this book; Dan Simon's *Optimal State Estimation* [2] has  a very good exposition if you are interested. The essence of the idea is that instead of having a state vector $\\mathbf{x}$ we make an augmented state containing\n",
     "\n",
     "$$\\mathbf{x} = \\begin{bmatrix}\\mathbf{x}_k \\\\ \\mathbf{x}_{k-1} \\\\ \\vdots\\\\ \\mathbf{x}_{k-N+1}\\end{bmatrix}$$\n",
     "\n",
diff --git a/14-Adaptive-Filtering.ipynb b/14-Adaptive-Filtering.ipynb
index c73e8a87..761f342b 100644
--- a/14-Adaptive-Filtering.ipynb
+++ b/14-Adaptive-Filtering.ipynb
@@ -1446,7 +1446,7 @@
     "\n",
     "This naive approach leads to combinatorial explosion. At step 1 we generate $N$ hypotheses, or 1 per filter. At step 2 we generate another $N$ hypotheses which then need to be combined with the prior $N$ hypotheses, which yields $N^2$ hypothesis. Many different schemes have been tried which either cull unlikely hypotheses or merge similar ones, but the algorithms still suffered from computational expense and/or poor performance. I will not cover these in this book, but prominent examples in the literature are the generalized pseudo Bayes (GPB) algorithms.\n",
     "\n",
-    "The *Interacting Multiple Models* (IMM) algorithm was invented by Blom[5] to solve the combinatorial explosion problem of multiple models. A subsequent paper by Blom and Bar-Shalom is the most cited paper [6]. The idea is to have 1 filter for each possible mode of behavior of the system. At each epoch we let the filters *interact* with each other. The more likely filters modify the estimates of the less likely filters so they more nearly represent the current state of the sytem. This blending is done probabilistically, so the unlikely filters also modify the likely filters, but by a much smaller amount. \n",
+    "The *Interacting Multiple Models* (IMM) algorithm was invented by Blom[5] to solve the combinatorial explosion problem of multiple models. A subsequent paper by Blom and Bar-Shalom is the most cited paper [6]. The idea is to have 1 filter for each possible mode of behavior of the system. At each epoch we let the filters *interact* with each other. The more likely filters modify the estimates of the less likely filters so they more nearly represent the current state of the system. This blending is done probabilistically, so the unlikely filters also modify the likely filters, but by a much smaller amount. \n",
     "\n",
     "For example, suppose we have two modes: going straight, or turning. Each mode is represented by a Kalman filter, maybe a first order and second order filter. Now say the target it turning. The second order filter will produce a good estimate, and the first order filter will lag the signal. The likelihood function of each tells us which of the filters is most probable. The first order filter will have low likelihood, so we adjust its estimate greatly with the second order filter. The second order filter is very likely, so its estimate will only be changed slightly by the first order Kalman filter. \n",
     "\n",
@@ -1559,14 +1559,14 @@
      "text": [
       "[[0.97 0.03]\n",
       " [0.05 0.95]]\n",
-      "From turn to straight probablility is 0.05 percent\n"
+      "From turn to straight probability is 0.05 percent\n"
      ]
     }
    ],
    "source": [
     "M = np.array([[.97, .03], [.05, .95]])\n",
     "print(M)\n",
-    "print('From turn to straight probablility is', M[1, 0], 'percent')"
+    "print('From turn to straight probability is', M[1, 0], 'percent')"
    ]
   },
   {
@@ -1587,7 +1587,7 @@
     "\n",
     "$$\\bar c_j = \\sum\\limits_{i=1}^{N} \\mu_i M_{ij}$$\n",
     "\n",
-    "We use NumPy's `dot` function to compute this for us. We could also use the matix multiply operator `@`, but I find using dot for the summation symbol, which is the dot product, more intuitive:"
+    "We use NumPy's `dot` function to compute this for us. We could also use the matrix multiply operator `@`, but I find using dot for the summation symbol, which is the dot product, more intuitive:"
    ]
   },
   {
diff --git a/Appendix-G-Designing-Nonlinear-Kalman-Filters.ipynb b/Appendix-G-Designing-Nonlinear-Kalman-Filters.ipynb
index e5f7b3df..610a35c3 100644
--- a/Appendix-G-Designing-Nonlinear-Kalman-Filters.ipynb
+++ b/Appendix-G-Designing-Nonlinear-Kalman-Filters.ipynb
@@ -665,7 +665,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can artificially force the Kalman filter to track the ball by making $Q$ large. That would cause the filter to mistrust its prediction, and scale the kalman gain $K$ to strongly favor the measurments. However, this is not a valid approach. If the Kalman filter is correctly predicting the process we should not 'lie' to the filter by telling it there are process errors that do not exist. We may get away with that for some problems, in some conditions, but in general the Kalman filter's performance will be substandard.\n",
+    "We can artificially force the Kalman filter to track the ball by making $Q$ large. That would cause the filter to mistrust its prediction, and scale the kalman gain $K$ to strongly favor the measurements. However, this is not a valid approach. If the Kalman filter is correctly predicting the process we should not 'lie' to the filter by telling it there are process errors that do not exist. We may get away with that for some problems, in some conditions, but in general the Kalman filter's performance will be substandard.\n",
     "\n",
     "Recall from the **Designing Kalman Filters** chapter that the acceleration is\n",
     "\n",
diff --git a/Supporting_Notebooks/Interactions.ipynb b/Supporting_Notebooks/Interactions.ipynb
index 893fd114..d70bc696 100644
--- a/Supporting_Notebooks/Interactions.ipynb
+++ b/Supporting_Notebooks/Interactions.ipynb
@@ -75,7 +75,7 @@
     "# Experimenting with FPF'\n",
     "\n",
     "\n",
-    "The Kalman filter uses the equation $P^- = FPF^\\mathsf{T}$ to compute the prior of the covariance matrix during the prediction step, where P is the covariance matrix and F is the system transistion function. For a Newtonian system $x = \\dot{x}\\Delta t + x_0$ F might look like\n",
+    "The Kalman filter uses the equation $P^- = FPF^\\mathsf{T}$ to compute the prior of the covariance matrix during the prediction step, where P is the covariance matrix and F is the system transition function. For a Newtonian system $x = \\dot{x}\\Delta t + x_0$ F might look like\n",
     "\n",
     "$$F = \\begin{bmatrix}1 & \\Delta t\\\\0 & 1\\end{bmatrix}$$\n",
     "\n",