Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Combinatorial Ranking?

3 views
Skip to first unread message

Steve Hollasch

unread,
Jun 21, 1999, 3:00:00 AM6/21/99
to
---------- Summary

I've got a problem that I'd like to solve, but it doesn't look like any
that I've encountered before. I'll include the motivation for the problem,
but here's the problem in a nutshell:

Given:
A set S of size N of non-negative numbers
A subset size M
A sum P

Find the number of all subsets of S of size M whose element sum is less
than P.
The brute force algorithm (enumerate all possible subsets of size M and
compare) is O(N!). I'm hoping there's a much more efficient method.

---------- Motivation

Here's the motivation for the problem. In my local paper (The Seattle
Times) there's a financial writer who gathers the readers' ten most popular
Northwest stocks (about 250 in all), and compares that against a random
portfolio of ten stocks. Naturally, this is a crap shoot, as a random
portfolio may well beat the readers' portfolio. This doesn't, however, mean
that one should choose random portfolios since many of them should beat even
intelligently-chosen portfolios.
What I'd rather do is to compute the percentile ranking of a given
portfolio to see how far it gets past the median. In other words, I'd expect
a good portfolio to achieve something like an 80% percentile ranking. If you
get a 40% ranking, then you're indeed better off rolling dice and spending
your time doing other things. Further, percentile ranking is an objective
measure of relative performance which can be used to compare amongst
historical portfolios.


Paul E. Black

unread,
Jun 21, 1999, 3:00:00 AM6/21/99
to
"Steve Hollasch" <stev...@microsoft.com> writes:

I would sort the set of numbers first, then enumerate combinations
beginning with the M smallest numbers. Worst case, this is O(n^M),
which is not out of the question since *all* subsets might have sums
less than P. This is a little better that the brute force approach.

Another approach is sorting then using a memoized enumeration. The
function call to memoize is
combos(m, p, n)
This is the number of combinations of m numbers, whose total is less
than p, using no number greater than the nth greatest number.
outer loop is
Combos tries all combinations of one fewer number, like this
combos(m, p, n)
{
if m, p, n already computed, then
return the value
if m = 0, then
return 1
total = 0
for k = m to n do
if Numbers[k] <= p, then
total = total + combos(m-1, p - Numbers[k], k)
endfor
remember the value of m, p, n is total
return total
}
The top level call is combos(M, P, N). This should run in O(MN).

Sincerely,
-paul-
--
Paul E. Black (p.b...@acm.org) 100 Bureau Drive, Stop 8970
paul....@nist.gov Gaithersburg, Maryland 20899-8970
voice: +1 301 975-4794 fax: +1 301 926-3696
Web: http://hissa.ncsl.nist.gov/~black/black.html KC7PKT

Doug Elrod

unread,
Jun 21, 1999, 3:00:00 AM6/21/99
to
In article <7km0tn$9...@news.dns.microsoft.com>, "Steve Hollasch"
<stev...@microsoft.com> wrote:

> ---------- Summary
>
> I've got a problem that I'd like to solve, but it doesn't look like any
> that I've encountered before. I'll include the motivation for the problem,

> but here's the problem in a nutshell:
>
> Given:
> A set S of size N of non-negative numbers
> A subset size M
> A sum P
>
> Find the number of all subsets of S of size M whose element sum is less
> than P.
> The brute force algorithm (enumerate all possible subsets of size M and
> compare) is O(N!). I'm hoping there's a much more efficient method.
>
> ---------- Motivation
>
> Here's the motivation for the problem. In my local paper (The Seattle
> Times) there's a financial writer who gathers the readers' ten most popular
> Northwest stocks (about 250 in all), and compares that against a random
> portfolio of ten stocks. Naturally, this is a crap shoot, as a random
> portfolio may well beat the readers' portfolio. This doesn't, however, mean
> that one should choose random portfolios since many of them should beat even
> intelligently-chosen portfolios.
> What I'd rather do is to compute the percentile ranking of a given
> portfolio to see how far it gets past the median. In other words, I'd expect
> a good portfolio to achieve something like an 80% percentile ranking. If you
> get a 40% ranking, then you're indeed better off rolling dice and spending
> your time doing other things. Further, percentile ranking is an objective
> measure of relative performance which can be used to compare amongst
> historical portfolios.

This is one form of the "permutation test" in statistics. This was
proved ~10 years ago to be NP-hard (when the question to be answered
is "Is P less than a specified constant k?"), although I don't
have the reference ("Permutation" and "NP" are in the title). I asked
about it in comp.theory at that time. (It's harder than the NP-complete
problem PARTITION, which is described in Garey & Johnson.) So, you
may not be able to do too much better than brute force.

-Doug Elrod (dr...@cornell.edu)

David Eppstein

unread,
Jun 21, 1999, 3:00:00 AM6/21/99
to
"Steve Hollasch" <stev...@microsoft.com> writes:
> Find the number of all subsets of S of size M whose element sum is
> less than P. The brute force algorithm (enumerate all possible subsets
> of size M and compare) is O(N!). I'm hoping there's a much more
> efficient method.

Even if you used brute force you would spend time O(N choose M), which
is at worst 2^N, much smaller than N!. But anyway, it's possible to
spend time proportional to only the number of low-weight M-element
subsets, rather than all M-element subsets.

Let S = {x1,x2,...xn}.
Form a graph having as its vertices the pairs (i,j), 0<=i<=n and 0<=j<=m.
Draw a weighted edge from each (i,j) to (i,j+1) with weight zero.
Draw a weighted edge from each (i,j) to (i+1,j+1) with weight x(i+1).

Then this is a directed acyclic graph in which the paths from (0,0) to
(n,m) are in 1-1 correspondence with the m-element subsets of S. More
generally the paths from (0,0) to (i,j) are in 1-1 correspondence with
the j-element subsets of {x1,x2,...xi}. The weight of a path is the
same as the element sum of the corresponding subset.

You can then apply e.g. my k-shortest-paths algorithm (see
http://www.ics.uci.edu/~eppstein/pubs/p-kpath.html for a copy of my
paper and pointers to source code) to find all short paths in constant
time per path, after a preprocessing stage which is linear in the size
of the graph (O(nm)).

If the elements of S are all nonnegative integers and P is small, you
can instead count the paths by dynamic programming in time O(nmP).
--
David Eppstein UC Irvine Dept. of Information & Computer Science
epps...@ics.uci.edu http://www.ics.uci.edu/~eppstein/

Rasmus Pagh

unread,
Jun 22, 1999, 3:00:00 AM6/22/99
to

Steve Hollasch wrote:
> Given:
> A set S of size N of non-negative numbers
> A subset size M
> A sum P
>
> Find the number of all subsets of S of size M whose element sum is less
> than P.

You can *approximate* the number with a simple random experiment:
Pick X sets of size M independently and uniformly at random, count
the number Y of those sets whose element sum is less than P. An
estimate of the fraction of "lesser" sets is then Y / X.
If you just want the estimate to be trustworthy within (say) 0.01
X=10,000 samples should suffice.

/Rasmus

--
_________________________________________________________________

Rasmus Pagh (pa...@daimi.au.dk) http://www.daimi.au.dk/~pagh/
_________________________________________________________________

Glenn

unread,
Jun 22, 1999, 3:00:00 AM6/22/99
to
"Steve Hollasch" <stev...@microsoft.com> writes:

>---------- Summary

> I've got a problem that I'd like to solve, but it doesn't look like any
>that I've encountered before. I'll include the motivation for the problem,
>but here's the problem in a nutshell:

> Given:


> A set S of size N of non-negative numbers
> A subset size M
> A sum P

> Find the number of all subsets of S of size M whose element sum is less
>than P.

> The brute force algorithm (enumerate all possible subsets of size M and
>compare) is O(N!). I'm hoping there's a much more efficient method.

>---------- Motivation

In the worst case all subsets of size M could sum to less than P and
hence you would end up checking all C(N,M) subsets anyway (you could
specifically check for the worst case by finding the maximum elements
and testing but this doesn't help in general).


In general you don't have to try every possible subset.

Algorithm
1) Sort set S from low to high and call the elements S1, S2, ... N.

2) You can take the usual algorithm for generating M-element
subsets such that the element indexes 1..N for the subsets are in
lexicographic order. (usually called generating the m-combinations
of an N-set in lexicographic order). Modify this algorithm so that
you "backup" whenever the sum gets too large. Also, you should be
able to modify the subset sums on the fly so that you don't have to
sum up the entire subset.


Modification of algorithm given in "Combinatorial Algorithms"
by Reingold, Nievergelt, and Deo


int N, M, P; /* the given values of N, M, and P */
int S[N+1]; /* Given set S -- not using array element 0 */


int c[M+1]; /* indicates which indexes are in the current subset */
int SUM = 0; /* current sum */
int tsum; /* temporary for a partial sum of newly added elements */
int count = 0; /* number of subsets whose SUM is < P
int i;
int j=1;

c[0] = -1;
for i=1 to M by 1 /* loop sets initial subset and its SUM */
{
c[i] = i;
SUM = SUM + S[i];
}

while (j > 0 AND SUM < P)
{
count = count + 1;
j = M;

while (c[j] == N - M + j) /* back up throwing elements out */
{
SUM = SUM - S[c[j]];
j = j - 1;
}

next_update: /* goto label */
SUM = SUM - S[c[j]];
c[j] = c[j] + 1;

tsum = S[c[j]];
for i = j+1 to M by 1 /* advance adding elements in */
{
c[i] = c[i-1] + 1;
tsum = tsum + S[c[i]];
}

if (SUM + tsum >= P)
{
j = j - 1; /* backup 1 level */
if (j <= 0) break; /* break out of while loop */
goto next_update;
}

SUM = SUM + tsum;
}

print "the number of subsets is," count;

Translate the above into your favorite programming language and your
problem is solved!


0 new messages