Selection algorithm

Giuliano Bertoletti

unread,

Aug 29, 2009, 11:29:02 AM8/29/09

to

Hello,

is there a better way to find the first k ranked elements of a vector of
size n (with k < n) without sorting it first.

Roughly I have n in the range [200..2000] and k = 3 or 4.

Sorting would cost at least n*log(n) which in my opinion is a waste of time.

Maybe I could hardwire a loop with a few comparisons for a given k (say
k=3). I mean specialize a function for a certain k.

Any ideas ?

Cheers,
Giulio.

Victor Bazarov

unread,

Aug 29, 2009, 12:24:35 PM8/29/09

to

GIYF

Also, see FAQ 5.2, just in case.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Stuart Golodetz

unread,

Aug 29, 2009, 12:43:12 PM8/29/09

to

Giuliano Bertoletti wrote:
>
> Hello,
>
> is there a better way to find the first k ranked elements of a vector of
> size n (with k < n) without sorting it first.

Yes. I recommend you read Effective STL (by Scott Meyers).

Cheers,
Stu

SG

unread,

Aug 29, 2009, 12:44:11 PM8/29/09

to

On 29 Aug., 17:29, Giuliano Bertoletti <gbe32...@libero.it> wrote:
> Hello,
>
> is there a better way to find the first k ranked elements of a vector of
> size n (with k < n) without sorting it first.
>
> Roughly I have n in the range [200..2000] and k = 3 or 4.
>
> Sorting would cost at least n*log(n) which in my opinion is a waste of time.

There's std::partial_sort which does what you want. But it obviously
modifies the vector. If you don't want to alter that vector you can
process the sequence in one pass and remember the k smallest elements.
I think you can do that with a little heap. Checkout std::push_heap,
std::pop_heap. The complexity should be something like O(log(k)*n).

Cheers!
SG

Giuliano Bertoletti

unread,

Aug 29, 2009, 12:49:52 PM8/29/09

to

Victor Bazarov ha scritto:

>
> GIYF
>
> Also, see FAQ 5.2, just in case.
>
> V

A better answer might have been: "use std::nth_element function of the
STL library".

By the way, that's not a homework assignement.

Just trying to speed up a complex search algorithm where pruning is
mandatory and a lazy sort of the tree branches would be too expensive.

Cheers,
Giulio.

Giuliano Bertoletti

unread,

Aug 29, 2009, 12:59:15 PM8/29/09

to

Hello,

I guess that perfectly suits may needs. Thank you.

Not only I do not care if the elements get rearranged, but I also need
to erase the remaining n-k.

So a partial_sort followed by a resize of the vector should be ok.

The algorithm is meant to expand and evaluate the n branches of a tree
node and then keep only k of them where the search continues.

Cheers,
Giulio.

SG ha scritto:

Stuart Golodetz

unread,

Aug 29, 2009, 2:27:42 PM8/29/09

to

For what it's worth, partial_sort also sorts the k smallest elements,
whereas nth_element doesn't: if you don't care about the order of the
elements afterwards, I guess you'd expect nth_element to involve less
work, but I don't know how much difference it would make in practice
without profiling it.

The bit in Effective STL I was referring to was Item 31, in case it's
useful.

Regards,
Stu

Jerry Coffin

unread,

Aug 29, 2009, 6:48:53 PM8/29/09

to

In article <4a99494c$0$43589$4faf...@reader1.news.tin.it>, gbe32241
@libero.it says...

std::partial_sort should do the job reasonably well. Technically it
does a little more than you need (the first N elements are sorted)
but for small k, the difference is unlikely to mean much.

--
Later,
Jerry.

Jerry Coffin

unread,

Aug 29, 2009, 8:38:58 PM8/29/09

to

In article <5c6afd17-467c-4557-b81f-8ffea93e0d21
@p15g2000vbl.googlegroups.com>, s.ges...@gmail.com says...

[ ... ]

> There's std::partial_sort which does what you want. But it
> obviously modifies the vector. If you don't want to alter that
> vector you can process the sequence in one pass and remember the k
> smallest elements.

Or look up std::partial_sort_copy.

--
Later,
Jerry.

Digital Puer

unread,

Aug 30, 2009, 12:22:58 AM8/30/09

to

On Aug 29, 8:29 am, Giuliano Bertoletti <gbe32...@libero.it> wrote:
> Hello,
>
> is there a better way to find the first k ranked elements of a vector of
> size n (with k < n) without sorting it first.
>
> Roughly I have n in the range [200..2000] and k = 3 or 4.
>
> Sorting would cost at least n*log(n) which in my opinion is a waste of time.
>
> Maybe I could hardwire a loop with a few comparisons for a given k (say
> k=3). I mean specialize a function for a certain k.
>

> Any ideas ?
>
> Cheers,
> Giulio.

You can do it in O(n) time. In the "Introduction to Algorithms"
book by Cormen, et al., 2001, there are two algorithms to find
the kth largest (or smallest) element in an unordered list.
One of them is a randomised algorithm with theta(n) expected /
O(n^2) worst case running time, and another is O(n) worst case.
The second one is quite difficult to implement and has a high
hidden constant (it implements a median-of-median search).

Anyways, once you find the kth largest element in O(n) time,
you can take one pass through the list in O(n) time to find
the elements that are smaller (or larger). Total running time
is O(n).