Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
median / quantiles algorithm
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Bernd Mohr  
View profile  
 More options Apr 12 1991, 8:42 am
Newsgroups: sci.math.stat
From: m...@faui78a.informatik.uni-erlangen.de (Bernd Mohr)
Date: 12 Apr 91 11:34:35 GMT
Local: Fri, Apr 12 1991 7:34 am
Subject: median / quantiles algorithm
I am a computer programmer with only little knowledge of statistics.
I am working on a program which computes some statistical indices (min, max,..)
of very large input data vectors. As you can imagine, there is a little
problem in computing quantiles (or the median), because the normal
algorithms for that problem assume, that you have access to all values.

In

 Raj Jain, Imrich Chlamtac
 "The P2 Algorithm for Dynamic Calculation of Quantiles
  and Histograms Without Storing Observations"
 in: Communcations of the ACM, Oct 1985, Vol. 28, No. 10

a heuristic algorithm is proposed, which calculates estimates for the
quantiles without storing the values.

Now my questions:

* I tested the algorithm with 5000 test values. The estimate of the median
  was only 0.002 % - 0.005 % away from the exact value. Is this good enough
  for an statistican ?

* Are there other (better?) algorithms for that purpose ?

Thank you very much,

Bernd Mohr
m...@immd7.informatik.uni-erlangen.de


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bill Venables  
View profile  
 More options Apr 13 1991, 9:49 am
Newsgroups: sci.math.stat
From: wvena...@spam.ua.oz.au (Bill Venables)
Date: 13 Apr 91 00:46:12 GMT
Local: Fri, Apr 12 1991 8:46 pm
Subject: Re: median / quantiles algorithm
In article <mohr.671456075@faui78a> m...@faui78a.informatik.uni-erlangen.de

Quantiles are often used for descriptive or informal comparative purposes
only and hence high accuracy is not required, so the method, on the face of
it, looks very promising, and more than adequate for most purposes.  Thanks
for the reference.

However the accuracy standards required of statistical calcualations are as
variable as in any discipline.  "Good enough for the statistician" is a
meaningless phrase, (perhaps even slightly insulting :-).  Whether it is
good enough for your application is a matter for your client, not for
statistics as a discipline in general. The usual convention in general
prupose packages is to offer such an approximation as a cheap alternative.

Just a point on quoting the error they way you have: quantiles are location
rather than scale measures and errors should, in principle, be given in
additive rather than multiplicative terms, relative to some suitable
measure of error variation.  Any method of estimating quantiles could be
made to appear arbitrarily good, in simple percentage terms, simply by
adding a sufficiently large number to each sample value.

Also the median is the easiest quantile to estimate, so what you have
quoted is the best possible case.  It may be a different story at the 10%
or 5% quantile.

>   * Are there other (better?) algorithms for that purpose ?

D. E. Knuth "The art of computer programming" vol 3: "Sorting and
searching" discusses the problem of finding the sample median at some
length, but only for exact calculation and not for general quantiles.

--
  Bill Venables, Dept. of Statistics,  | Email: venab...@spam.adelaide.edu.au
  Univ. of Adelaide,  South Australia. | Phone:                +61 8 228 5412


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nick Maclaren  
View profile  
 More options Apr 14 1991, 9:18 am
Newsgroups: sci.math.stat
From: n...@cl.cam.ac.uk (Nick Maclaren)
Date: 13 Apr 91 20:00:05 GMT
Local: Sat, Apr 13 1991 4:00 pm
Subject: Re: median / quantiles algorithm

[ A question about how to calculate the quantiles ]

There are essentially two methods of calculating the quantiles accurately,
and the 'approximate' methods are necessarily all dependent on the data
occuring in a suitably 'random' order.  I would use an exact method,
which are:

    1) Sorting the observations.  This is effectively O(N ln N) for any
number of quantiles.

    2) Multiple passes through the data.  This is effectively O(Q N) for
Q quantiles.

This means that sorting is most efficient UNLESS the number of data points
exceeds K**Q, where K is a constant and Q is the number of quantiles needed.
K is somewhere between 50 and 200, depending on the implementation.  Unless
you are looking for only the median, I advise sorting the data.  Sorry.

Nick Maclaren
University of Cambridge Computer Laboratory
n...@cl.cam.ac.uk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Douglas M. Hawkins  
View profile  
 More options Apr 14 1991, 1:40 pm
Newsgroups: sci.math.stat
From: d...@umnstat.uucp (Douglas M. Hawkins)
Date: 14 Apr 91 17:09:40 GMT
Local: Sun, Apr 14 1991 1:09 pm
Subject: Re: median / quantiles algorithm
In article <1991Apr13.200005.26...@cl.cam.ac.uk> n...@cl.cam.ac.uk (Nick Maclaren) writes:

>[ A question about how to calculate the quantiles ]

>There are essentially two methods of calculating the quantiles accurately,
>and the 'approximate' methods are necessarily all dependent on the data

                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>occuring in a suitably 'random' order.  I would use an exact method,

 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>which are:

Not so; one useful possibility for large data sets is to use a one-pass
algorithm to select a random subset of N (user-selected) of the data,
and to get the exact quantiles of the random subset.  By suitable
choice of N you can make the approximation as accurate as you like.
The use of the one-pass sampling algorithm means that no more than N
items need to be stored, regardless of the size of the whole data set.

Douglas M Hawkins                    Department of Applied Statistics
e-mail d...@umnstat.stat.umn.edu     University of Minnesota
Voice  (612) 625-8159                St Paul
Fax    (612) 624-2719                MN  55108


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »