The numerical median of 1, 2, 3, 4, 5, 6, 359 is 4. But for
navigational purposes you would actually order the numbers 359, 1, 2,
3, 4, 5, 6, so the desired median heading of the boat is actually 3.
Of course, you could express 359 better as -1 degrees to north, then
the sequence would be -1, 1, 2, 3, 4, 5, and 6. And you'd be good.
But that trick does not generalize if you go south instead, as you
have similar issues with -179, 174, 175, 176, 177, 178, and 179.
Has anybody solved this in Python, either for compass bearings or a
different domain? I can think of kind of a brute force solution where
you keep rotating the sequence until the endpoints are closest
together mod 360, but I wonder if there is something more elegant.
Brute force doesn't suck too much when using numpy to do the heavy lifting.
In [158]: import numpy as np
# Worst case. A von Mises distribution "centered" around the "South pole"
# at pi/-pi.
In [159]: theta = np.random.vonmises(np.pi, 1.0, size=1000)
# Complex division makes circular distances easier to compute.
In [160]: z = np.exp(1j * theta)
# The newaxis bit plus numpy's broadcasting yields a 1000x1000 array of
# the quotient of each pair of points in the dataset.
In [161]: zdiv = z / z[:,np.newaxis]
# Convert back to angular distances. Take the absolute value. Sum along one
# dimension. Find the index of the element with the smallest mean absolute
# circular deviation when used as the putative median.
In [162]: i = abs(np.arctan2(zdiv.imag, zdiv.real)).sum(axis=1).argmin()
# As expected, the result is close to the mode of the distribution.
In [163]: theta[i]
Out[163]: 3.0886753423606521
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
def compass_median(points):
points = sorted(points)
if points[-1] - points[0] > 180:
points = [(p - 360 if p > 180 else p) for p in points]
points.sort()
median = points[len(points) // 2]
return median + 360 if median < 0 else median
I like this implementation, and it would probably work 99.9999% of the
time for my particular use case. The only (very contrived) edge case
that I can think of is when you have 10 bearings to SSW, 10 bearings
to SSE, and the two outliers are unfortunately in the NE and NW
quadrants. It seems like the algorithm above would pick one of the
outliers.
Maybe the refinement to the algorithm above would be to find the mean
first, by summing sines and cosines of the bearings, taking the
quotient, and applying the arctangent. Then use the resulting angle
as the equivalent of "due north" and adjust angles to be within (-180,
180) respect to the mean, pretty much as you do in the code above,
with minor modifications.
I realize the problem as I stated it as sort of ill-defined.
The way you stated the solution made me realize more deeply that you
basically have a list that needs to be rotated either clockwise or
counterclockwise in certain situations. Instead of using the 180-
degree check to coarsely (but usually effectively) rotate your frame
of reference, you could instead use the mean to eliminate even more
edge cases.
First, there isn't always a solution; what would you consider to be the
median of [0, 90, 180, 270]?
In the case where the bearings are clustered, one approach is to
convert each bearing from polar to cartesian coordinates, compute the
centroid, then convert back to polar coordinates, i.e.:
from math import degrees, radians, sin, cos, atan2
def mean(bearings):
x = sum(sin(radians(a)) for a in bearings)
y = sum(cos(radians(a)) for a in bearings)
return degrees(atan2(x, y))
Then, subtract the mean from each bearing, coerce all angles into the
range -180..+180, calculate the median, add the mean, coerce back to
0..360.
def median(bearings):
m = mean(bearings)
bearings = [(a - m + 180) % 360 - 180 for a in bearings]
bearings.sort()
median = bearings[len(bearings) / 2]
median += m
median %= 360
return median
You probably posted before seeing my recent post. I agree that the
problem is ill-defined for certain cases.
> In the case where the bearings are clustered, one approach is to
> convert each bearing from polar to cartesian coordinates, compute the
> centroid, then convert back to polar coordinates, i.e.:
>
> from math import degrees, radians, sin, cos, atan2
>
> def mean(bearings):
> x = sum(sin(radians(a)) for a in bearings)
> y = sum(cos(radians(a)) for a in bearings)
> return degrees(atan2(x, y))
>
> Then, subtract the mean from each bearing, coerce all angles into the
> range -180..+180, calculate the median, add the mean, coerce back to
> 0..360.
>
> def median(bearings):
> m = mean(bearings)
> bearings = [(a - m + 180) % 360 - 180 for a in bearings]
> bearings.sort()
> median = bearings[len(bearings) / 2]
> median += m
> median %= 360
> return median
Yep, that's exactly the solution I'm leaning toward now.
> I like this implementation, and it would probably work 99.9999% of the
> time for my particular use case. The only (very contrived) edge case
> that I can think of is when you have 10 bearings to SSW, 10 bearings
> to SSE, and the two outliers are unfortunately in the NE and NW
> quadrants. It seems like the algorithm above would pick one of the
> outliers.
>
> Maybe the refinement to the algorithm above would be to find the mean
> first, by summing sines and cosines of the bearings, taking the
> quotient, and applying the arctangent. Then use the resulting angle
> as the equivalent of "due north" and adjust angles to be within (-180,
> 180) respect to the mean, pretty much as you do in the code above,
> with minor modifications.
I was going to suggest this. Let us know if it seems to work.
> I realize the problem as I stated it as sort of ill-defined.
Yes, I think this one reason stat books typically ignore directional
data. I think it is an unfortunate omission.
Terry Jan Reedy
> On Jan 22, 5:12 pm, MRAB <pyt...@mrabarnett.plus.com> wrote:
>> Steve Howell wrote:
>> > I just saw the thread for medians, and it reminded me of a problem
>> > that I need to solve. We are writing some Python software for
>> > sailing, and we need to detect when we've departed from the median
>> > heading on the leg. Calculating arithmetic medians is
>> > straightforward, but compass bearings add a twist.
[...]
> I like this implementation, and it would probably work 99.9999% of the
> time for my particular use case. The only (very contrived) edge case
> that I can think of is when you have 10 bearings to SSW, 10 bearings to
> SSE, and the two outliers are unfortunately in the NE and NW quadrants.
> It seems like the algorithm above would pick one of the outliers.
The trouble is that median of angular measurements is not a meaningful
concept. The median depends on the values being ordered, but angles can't
be sensibly ordered. Which is larger, 1 degree north or 359 degrees? Is
the midpoint between them 0 degree or 180 degree?
The median of the number line 0 <= x <= 360 is 180, but what's the median
of the circle 0 deg <= theta <= 360 deg? With no end points, it's
meaningless to talk about the middle.
For this reason, I expect that taking the median of bearing measurements
isn't well defined. You can probably get away with it so long as the
bearings are "close", where "close" itself is ill-defined.
In any case, this isn't a programming problem, it's a mathematics
problem. I think you're better off taking it to a maths or statistics
forum, and see what they say.
--
Steven
Here's a simple (too simple?) way to do it:
1. sort the bearings in ascending order
2. find the largest gap between two consecutive bearings
3. cut the circle at this point and now find the median the
normal way
In Python:
>>> def median_bearing(bearings):
... bearings = sorted(bearings)
... n = len(bearings)
... i = max(xrange(n), key=lambda i: (bearings[(i+1)%n] - bearings[i])%360)
... return bearings[(i + (n+1)//2)%n]
...
>>> median_bearing([1,2,3,4,5,6,359])
3
>>> median_bearing([-179, 174, 175, 176, 177, 178, 179])
177
I guess there may be cases where the results that it gives are still not
very good, as in general there isn't a good notion of median for cyclic
data. E.g. what would be the median of [0, 180] - it could equally be
090 or 270. Or the median of [0, 120, 240] has the same problem. But I
suppose your data couldn't be like this as then it would be ill-advised
to try to apply a notion of median to it.
But it will work well I believe with quite localized data set
with a few, even wildly inaccurate, outliers. E.g.
>>> median_bearing([123, 124, 125, 125, 126, 10, 240])
125
HTH
--
Arnaud
Then don't define the median that way. Instead, define the median as the point
that minimizes the sum of the absolute deviations of the data from that point
(the L1 norm of the deviations, for those familiar with that terminology). For
1-D data on the real number line, that corresponds to sorting the data and
taking the middle element (or the artithmetic mean of the middle two in the case
of even-numbered data). My definition applies to other spaces, too, that don't
have a total order attached to them including the space of angles.
The "circular median" is a real, well-defined statistic that is used for exactly
what the OP intends to use it for.
> The median of the number line 0<= x<= 360 is 180, but what's the median
> of the circle 0 deg<= theta<= 360 deg? With no end points, it's
> meaningless to talk about the middle.
>
> For this reason, I expect that taking the median of bearing measurements
> isn't well defined. You can probably get away with it so long as the
> bearings are "close", where "close" itself is ill-defined.
>
> In any case, this isn't a programming problem, it's a mathematics
> problem. I think you're better off taking it to a maths or statistics
> forum, and see what they say.
<puts on statistics hat>
Same answer with my statistics hat on or off. :-)
</puts on statistics hat>
Hi,
This problem is trivial to solve if you can assume that you that your
data points are measured consecutively and that your boat does not
turn by more than 180 degrees between two samples, which seems a
reasonable use case. If you cannot make this assumption, the answer
seems pretty arbitrary to me anyhow. The standard trick in this
situation is to 'unwrap' the data (fix > 180 deg jumps by adding or
subtracting 360 to subsequent points), do your thing and then 'rewrap'
to your desired interval ([0-355] or [-180,179] degrees).
In [1]: from numpy import *
In [2]: def median_degree(degrees):
...: return mod(rad2deg(median(unwrap(deg2rad(degrees)))),360)
...:
In [3]: print(median_degree([1, 2, 3, 4, 5, 6, 359]))
3.0
In [4]: print(median_degree([-179, 174, 175, 176, 177, 178, 179]))
177.0
If the deg2rad and rad2deg bothers you, you should write your own
unwrap function that handles data in degrees.
Hope this helps,
Bas
P.S.
Slightly off-topic rant against both numpy and matlab implementation
of unwrap: They always assume data is in radians. There is some option
to specify the maximum jump size in radians, but to me it would be
more useful to specify the interval of a complete cycle, so that you
can do
unwrapped_radians = unwrap(radians)
unwrapped_degrees = unwrap(degrees, 360)
unwrapped_32bit_counter = unwrap(overflowing_counter, 2**32)
Rants accompanied with patches are more effective. :-)
On Jan 25, 5:34 pm, Robert Kern <robert.k...@gmail.com> wrote:>
> Rants accompanied with patches are more effective. :-)
As you wish (untested):
def unwrap(p, cycle=2*pi, axis=-1):
"""docstring to be updated"""
p = asarray(p)
half_cycle = cycle / 2
nd = len(p.shape)
dd = diff(p, axis=axis)
slice1 = [slice(None, None)]*nd # full slices
slice1[axis] = slice(1, None)
ddmod = mod(dd+half_cycle, cycle)-half_cycle
_nx.putmask(ddmod, (ddmod==-half_cycle) & (dd > 0), half_cycle)
ph_correct = ddmod - dd;
_nx.putmask(ph_correct, abs(dd)<half_cycle, 0)
up = array(p, copy=True, dtype='d')
up[slice1] = p[slice1] + ph_correct.cumsum(axis)
return up
I never saw a use case for the discontinuity argument, so in my
preferred version it would be removed. Of course this breaks old code
(by who uses this option anyhow??) and breaks compatibility between
matlab and numpy.
Chears,
Bas
Sometimes legitimate features have phase discontinuities greater than pi. If you
want your feature to be accepted, please submit a patch that does not break
backwards compatibility and which updates the docstring and tests appropriately.
I look forward to seeing the complete patch! Thank you.
http://projects.scipy.org/numpy
[snip]
> > I never saw a use case for the discontinuity argument, so in my
> > preferred version it would be removed. Of course this breaks old code
> > (by who uses this option anyhow??) and breaks compatibility between
> > matlab and numpy.
On Jan 25, 6:39 pm, Robert Kern <robert.k...@gmail.com> wrote:
> Sometimes legitimate features have phase discontinuities greater than pi.
We are dwelling more and more off-topic here, but anyhow: According to
me, the use of unwrap is inherently related to measurement instruments
that wrap around, like rotation encoders, interferometers or up/down
counters. Say you have a real phase step of +1.5 pi, how could you
possibly discern if from a real phase step of -pi/2? This is like an
aliasing problem, so the only real solution would be to increase the
sampling speed of your system. To me, the discontinuity parameter is
serving some hard to explain corner case (see matlab manual), which is
better left to be solved by hand in cases it appears. I regret matlab
ever added the feature.
> If you want your feature to be accepted, please submit a patch that does not break
> backwards compatibility and which updates the docstring and tests appropriately.
> I look forward to seeing the complete patch! Thank you.
I think my 'cycle' argument does have real uses, like the degrees in
this thread and the digital-counter example (which comes from own
experience and required me to write my own unwrap). I'll try to submit
a non-breaking patch if I ever have time.
Bas
I admitted pretty early in the thread that I did not define the
statistic with much rigor, although most people got the gist of the
problem, and as Robert points out, you can more clearly the define the
problem, although I think under any definition, some inputs will have
multiple solutions, such as (0, 90, 180, 270) and (0, 120, 240). If
you've ever done lake sailing, you probably have encountered days
where the wind seems to be coming from those exact angles.
This is the code that I'll be using (posted by "Nobody"). I'll report
back it if it has any issues.
def mean(bearings):
x = sum(sin(radians(a)) for a in bearings)
y = sum(cos(radians(a)) for a in bearings)
return degrees(atan2(x, y))
def median(bearings):