Kalman Filter Sklearn

0 views

Skip to first unread message

Asia Jordan

unread,

Aug 3, 2024, 4:34:04 PM8/3/24

to lectgotetcorn

[EDIT]The answer by @Claudio gives me a really good tip on how to filter out outliers. I do want to start using a Kalman filter on my data though. So I changed the example data below so that it has subtle variation noise which are not so extreme (which I see a lot as well). If anybody else could give me some direction on how to use PyKalman on my data that would be great.[/EDIT]

For a robotics project I'm trying to track a kite in the air with a camera. I'm programming in Python and I pasted some noisy location results below (every item also has a datetime object included, but I left them out for clarity).

I first thought of manually calculating outliers and then simply removing them from the data in real time. Then I read about Kalman filters and how they are specifically meant to smoothen out noisy data.So after some searching I found the PyKalman library which seems perfect for this. Since I was kinda lost in the whole Kalman filter terminology I read through the wiki and some other pages on Kalman filters. I get the general idea of a Kalman filter, but I'm really lost in how I should apply it to my code.

I would strongly recommend reading this webpage. I have no connection to, or knowledge of the author, but I spent about a day trying to get my head round Kalman filters, and this page really made it click for me.

Briefly; for a system which is linear, and which has known dynamics (i.e. if you know the state and inputs, you can predict the future state), it provides an optimal way of combining what you know about a system to estimate it's true state. The clever bit (which is taken care of by all the matrix algebra you see on pages describing it) is how it optimally combines the two pieces of information you have:

You specify how sure you are on each of these (via the co-variances matrices R and Q respectively), and the Kalman Gain determines how much you should believe your model (i.e. your current estimate of your state), and how much you should believe your measurements.

Let's treat the kite as a particle (obviously a simplification, a real kite is an extended body, so has an orientation in 3 dimensions), which has four states which for convenience we can write in a state vector:

Where x and y are the positions, and the _dot's are the velocities in each of those directions. From your question I'm assuming there are two (potentially noisy) measurements, which we can write in a measurement vector:

We then need to describe the system dynamics. Here I will assume that no external forces act, and that there is no damping on the movement of the Kite (with more knowledge you may be able to do better, this effectively treats external forces and damping as an unknown/unmodeled disturbance).

Where "dt" is the time-step. We assume (x, y) position is updated based on current position and velocity, and velocity remains unchanged. Given that no units are given we can just say the velocity units are such that we can omit "dt" from the equations above, i.e. in units of position_units/sample_interval (I assume your measured samples are at a constant interval). We can summarise these four equations into a dynamics matrix as (F discussed here, and transition_matrices in pykalman library):

The dynamics captured in the model above are very simple. Taken literally they say that the positions will be updated by current velocities (in an obvious, physically reasonable way), and that velocities remain constant (this is clearly not physically true, but captures our intuition that velocities should change slowly).

If we think the estimated state should be smoother, one way to achieve this is to say we have less confidence in our measurements than our dynamics (i.e. we have a higher observation_covariance, relative to our state_covariance).

Starting from end of code above, fix the observation covariance to 10x the value estimated previously, setting em_vars as shown is required to avoid re-estimation of the observation covariance (see here)

Finally, if you want to use this fitted filter on-line, you can do so using the filter_update method. Note that this uses the filter method rather than the smooth method, because the smooth method can only be applied to batch measurements. More here:

Plot below shows the performance of the filter method, including 3 points found using the filter_update method. Dots are measurements, dashed line are state estimates for filter training period, solid line are states estimates for "on-line" period.

A Kalman filter is an algorithm that is used to estimate the state of a time-varying system in the presence of noise and uncertainty. It was developed by Rudolf Kalman in the 1960s and has since become one of the most widely used and influential algorithms in the fields of control theory and signal processing.

The basic idea behind the Kalman filter is to use a probabilistic model to estimate the true state of a system at each time step, based on measurements that are subject to noise and other forms of uncertainty. The filter works by recursively estimating the state of the system at each time step, using a combination of the previous estimate and the most recent measurement.

The Kalman filter is particularly useful in applications where the measurements are subject to noise and the underlying system dynamics are uncertain or complex. It has been applied in numerous fields, including aerospace, robotics, economics, and communication systems, to name just a few.

The code first initializes a histogram, which is essentially a graphical representation of pixel values in the image. Specifically, it extracts the region of interest (ROI) from the current frame by selecting the rectangle defined by track_window coordinates.

The final parameter cv2.NORM_MINMAX specifies that we are normalizing the data between 0 and 255. The output of this step is then used to update the Kalman Filter state at each step of the tracking process by computing the back-projected probability density function (PDF) for each new frame.

Firstly, we create a Kalman filter object with 4 state variables and 2 measurements variables using cv2.KalmanFilter(4, 2). Then we set up the measurement matrix self.kalman.measurementMatrix as a 24 matrix that maps x and y coordinates to our 4-dimensional state vector (i.e. only positions are directly observable).

The transition matrix self.kalman.transitionMatrix defines how our state vectors evolve from time step t to t+1 based on a simple linear motion model where objects move linearly at constant velocity. The first two rows map position estimates onto future position estimates (position must advance by current speed values), while last two rows mantain unchanged predictions about velocities.

Next, we initialize the predicted state self.kalman.statePre with a 41 column vector that represents the initial position estimate in x and y (i.e., the center of the tracked window), and zero velocities. Lastly, we set self.kalman.statePost to have the same values as statePre since no measurements have been taken yet.

Discover the path to becoming a data scientist with our comprehensive FREE guide! Unlock your potential in this in-demand field and access valuable resources to kickstart your journey.

In the last chapter we discussed the difficulties that nonlinear systems pose. This nonlinearity can appear in two places. It can be in our measurements, such as a radar that is measuring the slant range to an object. Slant range requires you to take a square root to compute the x,y coordinates:

The nonlinearity can also occur in the process model - we may be tracking a ball traveling through the air, where the effects of gravity and air drag lead to highly nonlinear behavior. The standard Kalman filter performs poorly or not at all with these sorts of problems.

I generated this by taking 500,000 samples from the input, passing it through the nonlinear transform, and building a histogram of the result. We call these points sigma points. From the output histogram we can compute a mean and standard deviation which would give us an updated, albeit approximated Gaussian.

It has perhaps occurred to you that this sampling process constitutes a solution to our problem. Suppose for every update we generated 500,000 points, passed them through the function, and then computed the mean and variance of the result. This is called a 'Monte Carlo' approach, and it used by some Kalman filter designs, such as the Ensemble filter and particle filter. Sampling requires no specialized knowledge, and does not require a closed form solution. No matter how nonlinear or poorly behaved the function is, as long as we sample with enough sigma points we will build an accurate output distribution.

"Enough points" is the rub. The graph above was created with 500,000 sigma points, and the output is still not smooth. What's worse, this is only for 1 dimension. In general, the number of points required increases by the power of the number of dimensions. If you only needed 500 points for 1 dimension, you'd need 500 squared, or 250,000 points for two dimensions, 500 cubed, or 125,000,000 points for three dimensions, and so on. So while this approach does work, it is very computationally expensive. The Unscented Kalman filter uses a similar technique but reduces the amount of computation needed by a drastic amount by using a deterministic method of choosing the points.

Let's look at the problem in terms of a 2D covariance ellipse. I choose 2D merely because it is easy to plot; this will extend to any number of dimensions. Assuming some arbitrary nonlinear function, we will take random points from the first covariance ellipse, pass them through the nonlinear function, and plot their new position. Then we can compute the the mean and covariance of the transformed points, and use that as our estimate of the mean and probability distribution.

On the left we show an ellipse depicting the 1σ1\sigma1σ distribution of two state variables. The arrows show how several randomly sampled points might be transformed by some arbitrary nonlinear function to a new distribution. The ellipse on the right is drawn semi-transparently to indicate that it is an estimate of the mean and variance of this collection of points - if we were to sample, say, a million points the shape of the points might be very far different than an ellipse.