fast log2 implementation...

Daniel R. Lehenbauer

unread,

Jun 18, 1999, 3:00:00 AM6/18/99

to

does anyone know of a really fast way to take integer log2 of integers? i
currently have the code implemented something like:

while( n > 0 ){
n >>= 1
i++;
}
return( i );

which works well, but this is an extremely time critical operation and was
wondering if anyone here knew of a faster method (or a better newsgroup to
ask this question on).

thanks!
- daniel

--
comp.lang.c.moderated - moderation address: cl...@plethora.net

Stewart Brodie

unread,

Jun 20, 1999, 3:00:00 AM6/20/99

to

In message <clcm-1999...@plethora.net>

"Daniel R. Lehenbauer" <lehe...@rose-hulman.edu> wrote:

> does anyone know of a really fast way to take integer log2 of integers? i
> currently have the code implemented something like:
>
> while( n > 0 ){
> n >>= 1
> i++;
> }
> return( i );
>
> which works well, but this is an extremely time critical operation and was
> wondering if anyone here knew of a faster method (or a better newsgroup to
> ask this question on).

I don't understand what that code is doing. Assuming n and i are unsigned
and that i is zero initialised, I get wrong results for simple cases e.g. 1.
Your code returns 1, yet 1<<1 is 2, not 1. I would expect a result of 0.
Given the same assumptions, your code always seems to return a result such
that the post-condition "(1<<result) > n" is satisfied. Is this what you
want?

The following function will return log2 of the top bit set in n, assuming
that CHAR_BIT is an integer power of two. You can convert it to nearly mimic
your code by initialising result to 1 instead of 0 (except that the result
for 0 doesn't match):

#include <limits.h>

unsigned int quicklog2(unsigned int n) {
unsigned int bits = sizeof(n) * CHAR_BIT;
unsigned int result = 0;

while (bits > 0 && n != 0) {
bits >>= 1;
if (n >= (1 << bits)) {
result += bits;
n >>= bits;
}
}

return result;
}

This loop exhibits a worst case of (1 + log2 (bits in an unsigned int))
iterations, compared to yours which is a worst case of (bits in an unsigned
int). Therefore, it is not as good if the value of n is small, but can
perform significantly less iterations the more often higher order bits are
set. It may be worth performing a quick comparison on the number and switch
algorithm depending on the outcome eg. if n > 16 use the above, else do the
simple version. Your choice may also be affected by the cost of the various
mathematical functions used (eg. is all that the shifting expensive? That's
the sort of thing you would have to take to a processor specific newsgroup
though - I've only used them so much because I get the shifts for free on
ARM)

If the number of bits set in n is known to be small and you have an easy way
of finding log2 of a number known to contain a single set bit only, then the
following may beat quicklog2 - it's worst case is (bits in an unsigned int)
iterations; it will take p iterations if there are p bits set in n, but it
does require the magic function to map a single bit to a bit position. Look
through the archives of comp.sys.arm for David Seal's which is suitable for
32-bit int (and pretty much leads to optimal code being generated for ARM
targetts). I imagine that the same table would work for any shorter ints as
well as long as you promoted the values to at least 32-bit for an at least
32-bit multiply from which you can extract bits 27-31 as the table index.

unsigned int find_top_bit(unsigned int n)
{
while ((n & -n) != n) {
n -= (n & -n);
}

return n;
}

--
Stewart Brodie, Software Engineer
Pace Micro Technologies plc
645 Newmarket Road
Cambridge, CB5 8PB, United Kingdom WWW: http://www.pace.co.uk/

John McKown

unread,

Jun 20, 1999, 3:00:00 AM6/20/99

to

On Fri, 18 Jun 1999 18:51:26 GMT, Daniel R. Lehenbauer

<lehe...@rose-hulman.edu> wrote:
>does anyone know of a really fast way to take integer log2 of integers? i
>currently have the code implemented something like:
>
>while( n > 0 ){
> n >>= 1
> i++;
>}
>return( i );
>
>which works well, but this is an extremely time critical operation and was
>wondering if anyone here knew of a faster method (or a better newsgroup to
>ask this question on).
>

>thanks!
>- daniel

>
>--
>comp.lang.c.moderated - moderation address: cl...@plethora.net

Assuming that you are working on a binary twos complement machine (is there
any other kind today?), then you can more easily get the log2 but doing
bit tests. Assume a 32 bit, signed value. The maximum log2 would be 31.
So you can do something like:

if (n < 1) then return(-1); /* error in input */
if (n & 0x40000000) then return(31);
if (n & 0x20000000) then return(30);
if (n & 0x10000000) then return(29);
if (n & 0x08000000) then return(27);
if (n & 0x04000000) then return(26);
if (n & 0x02000000) then return(25);
if (n & 0x01000000) then return(24);
<and so forth until>
if (n & 0x00000002) then return(1);
return(0); /* why test? It MUST be 1 */

I think this would be faster. You would need to double check.

John

Ben Pfaff

unread,

Jun 21, 1999, 3:00:00 AM6/21/99

to

joa...@prodigy.net (John McKown) writes:

if (n < 1) then return(-1); /* error in input */
if (n & 0x40000000) then return(31);
if (n & 0x20000000) then return(30);
if (n & 0x10000000) then return(29);
if (n & 0x08000000) then return(27);
if (n & 0x04000000) then return(26);
if (n & 0x02000000) then return(25);
if (n & 0x01000000) then return(24);
<and so forth until>
if (n & 0x00000002) then return(1);
return(0); /* why test? It MUST be 1 */

If you're going to take this approach then you might as well change it
to a binary search:

...
if (n & 0xffff0000) {
if (n & 0xff000000) {
if (n & 0xf000000) {
if (n & 0xc0000000) {
if (n & 0x80000000)
return 32;
else
return 31;
} else {
if (n & 0x20000000)
return 30;
else
return 29;
}
} else {
...etc...

I hope the advantage is obvious: for 32 bits, you get an average and
worst case of 5 tests, as opposed to an average of 16 and worst case
of 32 tests, for data with a single bit set with a uniform
distribution.

Of course, you can do better than this. For instance, if you set up a
table of 256 entries, where each entry has the highest bit set as its
value, then you can simplify it to something like this:

if (n & 0xffff0000) {
if (n & 0xff000000)
return lookup[n >> 24];
else
return lookup[n >> 16];
} else {
if (n & 0x0000ff00)
return lookup[n >> 8];
} else {
return lookup[n];
}
}

which reduces it to 2 tests and a table lookup. Of course, this might
not be better if a table lookup is too expensive in memory or time.
In fact, on some hardware where jumps are expensive, the first version
(with up to 32 tests) might be the fastest.

Note that in neither of these cases have I made an attempt to make
sure that there aren't any off-by-one errors or to compare the results
that they'd produce against the definition of log2. They're simply
meant for illustration of some possibilities, and shouldn't be
considered working code.

Ben Pfaff

unread,

Jun 21, 1999, 3:00:00 AM6/21/99

to

I wrote:

if (n & 0xffff0000) {
if (n & 0xff000000)
return lookup[n >> 24];
else
return lookup[n >> 16];
} else {
if (n & 0x0000ff00)
return lookup[n >> 8];
} else {
return lookup[n];
}
}

But it obviously should have been more like:

if (n & 0xffff0000) {
if (n & 0xff000000)

return 24 + lookup[n >> 24];
else
return 16 + lookup[n >> 16];

} else {
if (n & 0x0000ff00)

return 8 + lookup[n >> 8];
} else {
return lookup[n];
}
}

--
"Unix... is not so much a product
as it is a painstakingly compiled oral history
of the hacker subculture."
--Neal Stephenson

Mathew Hendry

unread,

Jun 23, 1999, 3:00:00 AM6/23/99

to

On Fri, 18 Jun 1999 18:51:26 GMT, "Daniel R. Lehenbauer"
<lehe...@rose-hulman.edu> wrote:

>does anyone know of a really fast way to take integer log2 of integers?

>From earlier threads on this subject, I have accumulated 48 different
variants, along with a simple benchmarking driver. Mail me if you're
interested in seeing them. Here's a few of the nastier ones (I'm no
longer quite sure how some of them work :)

int
hi_bit6 (unsigned long n){
int h=0,i=32;
unsigned long m=-1;
if(!n)return -1;
while(i/=2)n&(m^=m>>i)&&(n&=m,h+=i);
return h;
}

const int tab_m37_rf[] = {
-1,31,6,30,9,5,0,29,16,8,2,4,21,-1,19,28,25,15,-1,
7,10,1,17,3,22,20,26,-1,11,18,23,27,12,24,13,14,-1
};

int
hi_bit10(unsigned long n){
int i=32;
unsigned long m=-1;
static const int *t=tab_m37_rf;
if(!n)return -1;
while(i/=2)m^=m<<i,n=n>>i&m|(n&m)<<i;
return t[(n^(n-1))%37];
}

const int tab_m37_ff[] = {
-1,0,25,1,22,26,31,2,15,23,29,27,10,-1,12,3,6,16,
-1,24,21,30,14,28,9,11,5,-1,20,13,8,4,19,7,18,17
};

int
hi_bit23(unsigned long n){
static const int *t=tab_m37_ff;
return t[(n|=n>>1,n|=n>>2,n|=n>>4,n|=n>>8,n|=n>>16)%37];
}

int
hi_bit46(unsigned long n){
return n>>16?n>>24?n>>28?n>>30?n>>31?31:30:n>>29?29:28:n
>>26?n>>27?27:26:n>>25?25:24:n>>20?n>>22?n>>23?23:22:n>>
21?21:20:n>>18?n>>19?19:18:n>>17?17:16:n>>8?n>>12?n>>14?
n>>15?15:14:n>>13?13:12:n>>10?n>>11?11:10:n>>9?9:8:n>>4?
n>>6?n>>7?7:6:n>>5?5:4:n>>2?n>>3?3:2:n>>1?1:n?0:-1;
}

int
hi_bit47(unsigned long n){
if ( n ) {
float f = n;
return ((*(unsigned long*)&f)>>23)-127;
} else {
return -1;
}
}

-- Mat.

Jens M Andreasen

unread,

Jun 23, 1999, 3:00:00 AM6/23/99

to

"Daniel R. Lehenbauer" wrote:
>
> does anyone know of a really fast way to take integer log2 of integers? i

Your compiler vendor must have thought about this when they implemented
the assignment of an integer to a float.

How about using the integer part of frexp()?
(Untested)

--

(
)
--c[]------------------- j...@linux.nu -------------------C---

Krzysztof Kolmaga

unread,

Jul 23, 1999, 3:00:00 AM7/23/99

to

Daniel R. Lehenbauer (lehe...@rose-hulman.edu) wrote:
: does anyone know of a really fast way to take integer log2 of integers? i

: currently have the code implemented something like:

: while( n > 0 ){
: n >>= 1
: i++;
: }
: return( i );

: which works well, but this is an extremely time critical operation and was
: wondering if anyone here knew of a faster method (or a better newsgroup to
: ask this question on).

: thanks!
: - daniel

: --

Krzysztof Kolmaga

unread,

Jul 25, 1999, 3:00:00 AM7/25/99

to

Sorry about the previous post (I was too quick on the trigger)

Daniel R. Lehenbauer (lehe...@rose-hulman.edu) wrote:
: does anyone know of a really fast way to take integer log2 of integers? i
: currently have the code implemented something like:

: while( n > 0 ){
: n >>= 1
: i++;
: }
: return( i );

: which works well, but this is an extremely time critical operation and was
: wondering if anyone here knew of a faster method (or a better newsgroup to
: ask this question on).

Theoretically the quickest way would be to do some sort of binary search
instead of shifting bit by bit(brute force). Of course in practice it
might not be worth it and if it is it depends on the size of your integer.
Anyhow as I thought the problem interesting I wrote the following program,
I don't know if it's faster though?

#include <stdio.h>

#define RSHIFT( num , bits ) ((num)>>(bits))

typedef unsigned int uint_t ;

int maxshift = sizeof(uint_t) ;

uint_t log2( uint_t number )
{
int i , diff ;
if ( number == 0 )
{
fprintf( stderr , "You cannot take a log of 0!\n" ) ;
exit(-1) ;
}
i = diff = maxshift/2 ;
while( diff > 0 )
{
diff = diff / 2 ;
if ( RSHIFT( number , i) == (uint_t)0 )
i -= diff ;
else
i+=diff ;
}
return i ;
}

main( int argc , char *argv[] )
{
int n ;
n = atoi(argv[1] ) ;
printf( "log2(%d)=%d\n" , n , log2(n) ) ;
}

I think you could speed it up even more by replacing :
diff = diff / 2 ; with diff>>=2 ;

I hope it helps.

Chris Kolmaga

Francis Glassborow

unread,

Jul 25, 1999, 3:00:00 AM7/25/99

to

In article <clcm-1999...@plethora.net>, Krzysztof Kolmaga
<ch...@odin.ugrad.physics.mcgill.ca> writes

>: does anyone know of a really fast way to take integer log2 of integers? i
>: currently have the code implemented something like:
>
>: while( n > 0 ){
>: n >>= 1
>: i++;
>: }
>: return( i );
>
>: which works well, but this is an extremely time critical operation and was
>: wondering if anyone here knew of a faster method (or a better newsgroup to
>: ask this question on).

If that algorithm is too slow perhaps it is time to write some assembler
code.

Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Alan Lehotsky

unread,

Jul 26, 1999, 3:00:00 AM7/26/99

to

In article <clcm-1999...@plethora.net>,
ch...@odin.ugrad.physics.mcgill.ca (Krzysztof Kolmaga) wrote:

>>>Daniel R. Lehenbauer (lehe...@rose-hulman.edu) wrote:

>>>: does anyone know of a really fast way to take integer log2 of integers? i
>>>: currently have the code implemented something like:
>>>
>>>: while( n > 0 ){
>>>: n >>= 1
>>>: i++;
>>>: }
>>>: return( i );
>>>
>>>: which works well, but this is an extremely time critical operation and was
>>>: wondering if anyone here knew of a faster method (or a better newsgroup to
>>>: ask this question on).

On a machine with good bit-field support, for positive N, you can
do it in about 4 instructions by recognizing that IEEE floating point
format has an exponent that is the LOG2 of the number (with a fudge
factor).

So, in pseudo-code, it's

float f = N;
logn = exponent(f) - FLOAT_EXP_BIAS

I implemented this on a VAX-11 in BLISS about 20 years ago, it contributed
to a measurable speed improvement in the register allocation algorithm
for the BLISS optimizer.

Good luck,
-- Al

--
Quality Software Management
http://www.tiac.net/users/lehotsky

Process Improvement | Management Consulting | Compiler Implementation

Peter Burka

unread,

Jul 26, 1999, 3:00:00 AM7/26/99

to

Krzysztof Kolmaga wrote:
>
<snip>

>
> I think you could speed it up even more by replacing :
> diff = diff / 2 ; with diff>>=2 ;
>

Of course you mean diff>>=1;
I'd be very surprised if most compilers didn't make this
optimization for you, although there may be some issues
if diff < 0 (which I don't think it can be, but it'd be
very difficult for the compiler to infer that).

/peter

Robert O'Dowd

unread,

Jul 26, 1999, 3:00:00 AM7/26/99

to

Francis Glassborow wrote:
>
> In article <clcm-1999...@plethora.net>, Krzysztof Kolmaga
> <ch...@odin.ugrad.physics.mcgill.ca> writes

> >: does anyone know of a really fast way to take integer log2 of integers? i
> >: currently have the code implemented something like:
> >
> >: while( n > 0 ){
> >: n >>= 1
> >: i++;
> >: }
> >: return( i );
> >
> >: which works well, but this is an extremely time critical operation and was
> >: wondering if anyone here knew of a faster method (or a better newsgroup to
> >: ask this question on).
>

> If that algorithm is too slow perhaps it is time to write some assembler
> code.
>

Using ++i rather than i++ will tend to be faster because the second form
introduces a temporary. An aggressive compiler may do that for you, but
you never know.

Your algorithm will tend to be slower for large values. If most of
your values are large, you may want to introduce shortcuts of the
form

if (n > 256)
{
// ...
}
else
{
// something else
}

Your milage will vary with this one because you'd be introducing
overheads of extra comparisons.

Keep in mind that your solution makes specific assumptions about the
representation of integer types. That representation isn't mandated
by the standard, so portability of this code is an issue (as is
use of assembler, obviously). Whatever works for your domain ....

-<Automagically included trailer>
Robert O'Dowd Ph +61 (8) 8259 6546
MOD/DSTO Fax +61 (8) 8259 5139
P.O. Box 1500 Email:
robert...@dsto.defence.gov.au
Salisbury, South Australia, 5108

Disclaimer: Opinions above are mine and may be worth what you paid for
them

Maurizio Loreti

unread,

Jul 26, 1999, 3:00:00 AM7/26/99

to

leho...@tiac.net (Alan Lehotsky) writes:

> In article <clcm-1999...@plethora.net>,
> ch...@odin.ugrad.physics.mcgill.ca (Krzysztof Kolmaga) wrote:
> On a machine with good bit-field support, for positive N, you can
> do it in about 4 instructions by recognizing that IEEE floating point
> format has an exponent that is the LOG2 of the number (with a fudge
> factor).
>
> So, in pseudo-code, it's
>
> float f = N;
> logn = exponent(f) - FLOAT_EXP_BIAS
>
> I implemented this on a VAX-11 in BLISS about 20 years ago, it contributed
> to a measurable speed improvement in the register allocation algorithm
> for the BLISS optimizer.

Correct. The function frexp() of the C standard library, prototyped
in math.h, does just that:

#include <stdio.h>
#include <math.h>
...
double x;
int l2x;
...
(void) frexp(x, &ex);
printf("Log2 of %f is %d\n", x, --ex);

--
Maurizio Loreti http://wwwcdf.pd.infn.it/~loreti/mlo.html
Un. of Padova, Dept. of Physics - Padova, Italy lor...@padova.infn.it

Keith Thompson

unread,

Jul 26, 1999, 3:00:00 AM7/26/99

to

"Robert O'Dowd" <nos...@nonexistant.com> writes:
> Francis Glassborow wrote:
> >
> > In article <clcm-1999...@plethora.net>, Krzysztof Kolmaga
> > <ch...@odin.ugrad.physics.mcgill.ca> writes
> > >: does anyone know of a really fast way to take integer log2 of integers?
> > >: i currently have the code implemented something like:
> > >
> > >: while( n > 0 ){
> > >: n >>= 1
> > >: i++;
> > >: }
> > >: return( i );
> > >
> > >: which works well, but this is an extremely time critical operation
> > >: and was wondering if anyone here knew of a faster method (or a better
> > >: newsgroup to ask this question on).
> >
> > If that algorithm is too slow perhaps it is time to write some assembler
> > code.
> >
>
> Using ++i rather than i++ will tend to be faster because the second form
> introduces a temporary. An aggressive compiler may do that for you, but
> you never know.

Any non-broken compiler should generate identical (or at least
equvalent) code for "i++;" and "++i;", at least if optimization is
turned on. If yours doesn't, and you're at all concerned about
performance, complain bitterly to your vendor and/or switch compilers.

> Keep in mind that your solution makes specific assumptions about the
> representation of integer types. That representation isn't mandated
> by the standard, so portability of this code is an issue (as is
> use of assembler, obviously). Whatever works for your domain ....

I don't see any assumptions about integer representation beyond what's
guaranteed by the standard. If either n is unsigned or n is signed
and has a non-negative value, "n >>= 1;" is guaranteed to be
equivalent to "n /= 2;". By the way, declaring n as unsigned might
or might not give you better code.

You can do something like a binary search, but remember that a binary
search works best when the values being searched are uniformly
distributed. You can search over either the operand values or the
result values. The best way to do an integer log2 depends on the
expected distribution of the operands. (For log2, a binary search
over the operand values acts like a linear search over the result
values.)

If you're likely to be using the same operands repeatedly, consider
cacheing a few results -- but consider the overhead of checking the
cache on each call.

Finally, if your CPU has an instruction to count leading 0 bits in a
word, it should be fairly straightforward to use it in a small
assembly routine, if you don't mind getting a few bits under your
fingernails. If you care about portability, you can write both a
reasonably efficent C version and a blazingly fast assembly version,
and use whichever one is appropriate for each target system. Don't
resort to assembly until and unless you've proven that it's faster
than C, *and* that C isn't fast enough. In the absence of special
instructions that the compiler can't figure out how to use, a decent
optimizing compiler should generate code as good as an assembly
routine.

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
One of the great tragedies of ancient history is that Helen of Troy
lived before the invention of the champagne bottle.

Krzysztof Kolmaga

unread,

Jul 27, 1999, 3:00:00 AM7/27/99

to

: Of course you mean diff>>=1;
Yeah, sorry my mistake.

Robert O'Dowd

unread,

Jul 27, 1999, 3:00:00 AM7/27/99

to

Keith Thompson wrote:
> "Robert O'Dowd" <nos...@nonexistant.com> writes:
> > Francis Glassborow wrote:
> > > In article <clcm-1999...@plethora.net>, Krzysztof Kolmaga
> > > <ch...@odin.ugrad.physics.mcgill.ca> writes
> > > >: does anyone know of a really fast way to take integer log2 of integers?
> > > >: i currently have the code implemented something like:
> > > >
> > > >: while( n > 0 ){
> > > >: n >>= 1
> > > >: i++;
> > > >: }
> > > >: return( i );
> > > >
> > > >: which works well, but this is an extremely time critical operation
> > > >: and was wondering if anyone here knew of a faster method (or a better
> > > >: newsgroup to ask this question on).
> > >
> > > If that algorithm is too slow perhaps it is time to write some assembler
> > > code.
> > Using ++i rather than i++ will tend to be faster because the second form
> > introduces a temporary. An aggressive compiler may do that for you, but
> > you never know.
> Any non-broken compiler should generate identical (or at least
> equvalent) code for "i++;" and "++i;", at least if optimization is
> turned on. If yours doesn't, and you're at all concerned about
> performance, complain bitterly to your vendor and/or switch compilers.

This is a "quality of implementation" issue. I agree, a decent compiler
will do this, if optimisation is turned on. Coding ++i hurts nothing,
and may give you that bit more if your compiler does not.

>
> > Keep in mind that your solution makes specific assumptions about the
> > representation of integer types. That representation isn't mandated
> > by the standard, so portability of this code is an issue (as is
> > use of assembler, obviously). Whatever works for your domain ....
>
> I don't see any assumptions about integer representation beyond what's
> guaranteed by the standard. If either n is unsigned or n is signed
> and has a non-negative value, "n >>= 1;" is guaranteed to be
> equivalent to "n /= 2;". By the way, declaring n as unsigned might
> or might not give you better code.

Last I read, there is no guarantee that n >>= 1 be equivalent to
n /= 2. The binary representation used by many compiler/OS combinations
happens to give that, but is not formally required by the language.
Any equality between any mathematical operations (division,
multiplication)
and a specific bit shifting operation is specific to each compiler.

Of course, we could argue that n /= 2 can be coded in the above.
That gives the compiler an opportunity to find a more optimal
way. In practice, that will often be equivalent to n >>= 1.

As with anything where performance is a premium, the best way to do it
is to experiment and find out. Your milage varies (eg between machines,
compilers, operating systems, etc etc). Asking the original question
in a forum that is more directly relevant to your system is the way
to get more concrete answers.

[Other good points snipped]

-<Automagically included trailer>
Robert O'Dowd Ph +61 (8) 8259 6546
MOD/DSTO Fax +61 (8) 8259 5139
P.O. Box 1500 Email:
robert...@dsto.defence.gov.au
Salisbury, South Australia, 5108

Disclaimer: Opinions above are mine and may be worth what you paid for
them

Stewart Brodie

unread,

Jul 27, 1999, 3:00:00 AM7/27/99

to

In message <clcm-1999...@plethora.net>
Keith Thompson <k...@cts.com> wrote:

> I don't see any assumptions about integer representation beyond what's
> guaranteed by the standard. If either n is unsigned or n is signed
> and has a non-negative value, "n >>= 1;" is guaranteed to be
> equivalent to "n /= 2;". By the way, declaring n as unsigned might or
> might not give you better code.

I can't think offhand of a situation where declaring it unsigned is not going
to give you at least as good code and, as you have noted, provided you keep
it unsigned you don't fall over the signed non-negative case in which the
effect of the right shift is not guaranteed to be the same on all platforms
(it's implementation defined). I know that the compiler I maintain goes to
some lengths to determine if the signed value being shifted is non-negative
in order to convert the arithmetic shift to a logical shift - presumably
because some targets have an arithmetic shift which is less efficient than an
logical shift - or perhaps do not have an arithmetic shift and have to
generate code to emulate one.

--
Stewart Brodie, Software Engineer (Views expressed are my own and
Pace Micro Technologies PLC not those of my employer)

645 Newmarket Road
Cambridge, CB5 8PB, United Kingdom WWW: http://www.pace.co.uk/

Keith Thompson

unread,

Jul 30, 1999, 3:00:00 AM7/30/99

to

"Robert O'Dowd" <nos...@nonexistant.com> writes:

> Keith Thompson wrote:
[...]

> This is a "quality of implementation" issue. I agree, a decent compiler
> will do this, if optimisation is turned on. Coding ++i hurts nothing,
> and may give you that bit more if your compiler does not.

I'd be interested in knowing if there's *any* existing C or C++
compiler that generates more efficient code for "++i;" than for "i++;"
(or "i += 1;", or "i = i + 1;"), with or without optimization turned
on. If optimization is turned off, failing to make this optimization
is understandable; if it's turned on, generating extra code for "i++;"
is arguably a bug. Yes, I agree it's a "quality of implementation"
issue, but a misfeature like this would be serious enough to be called
a "quality of implementation" bug. IMHO, of course. (Note the
semicolons in the above expressions; these are all statement
expressions whose results are ignored.)

> Last I read, there is no guarantee that n >>= 1 be equivalent to
> n /= 2. The binary representation used by many compiler/OS combinations
> happens to give that, but is not formally required by the language.
> Any equality between any mathematical operations (division,
> multiplication)
> and a specific bit shifting operation is specific to each compiler.

What did you last read? 8-)}

Here's what ANSI/ISO 9899:1990, section 6.3.7, "Bitwise shift
operators", has to say about it:

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If
E1 has an unsigned type or if E1 has a signed type and a
nonnegative value, the value of the result is the integral part of
the quotient of E1 divided by the quantity, 2 raised to the power
E2. If E1 has a signed type and a negative value, the resulting
value is implementation-defined.

(An earlier paragraph that I'm too lazy to type in specifies integral
promotions and the result type, and says that the result is undefined
if E2 is negative or greater than or equal to the width in bits of the
left operand.)

I don't *think* the C standard actually requires binary representation
for integer types (C9X tightens this up substantially) but it at least
guarantees that integer types pretty much behave as if they were
represented in binary. If you want to create a conforming C
implementation for a decimal or trinary machine, the shift operators
still have to act like bit shifts -- which means they'll probably be
implemented as multiplies and divides. I real life, I don't think
there are any C implementations on non-binary machines.

Considering how many things C leaves undefined, unspecified, or
implementation-defined, it's sometimes a little surprising to see what
it *does* guarantee.

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
One of the great tragedies of ancient history is that Helen of Troy
lived before the invention of the champagne bottle.

Jerry Coffin

unread,

Jul 30, 1999, 3:00:00 AM7/30/99

to

In article <clcm-1999...@plethora.net>, nos...@nonexistant.com
says...

[ ... ]

> Last I read, there is no guarantee that n >>= 1 be equivalent to
> n /= 2.

That depends -- if n is positive, they're guaranteed to be equivalent.
If n is negative, the result is implementation defined.

> The binary representation used by many compiler/OS combinations
> happens to give that, but is not formally required by the language.

All implementations of C are required to use a pure binary
representation for all integer types.

Larry Jones

unread,

Jul 30, 1999, 3:00:00 AM7/30/99

to

Robert O'Dowd (nos...@nonexistant.com) wrote:
>
> Last I read, there is no guarantee that n >>= 1 be equivalent to
> n /= 2.

Have you tried reading the standard?

If E1 has an unsigned type or if E1 has a signed type and a

nonnegative value, the value of the result [of E1 >> E2] is the

integral part of the quotient of E1 divided by the quantity, 2
raised to the power E2.

> Any equality between any mathematical operations (division,

> multiplication)
> and a specific bit shifting operation is specific to each compiler.

Not true -- there is a similar equivalence for left shift.

-Larry Jones

I'll be a hulking, surly teen-ager before you know it!! -- Calvin

Larry Jones

unread,

Jul 30, 1999, 3:00:00 AM7/30/99

to

Stewart Brodie (sbr...@e-14.com) wrote [about n >>= 1]:

>
> I can't think offhand of a situation where declaring it unsigned is not going
> to give you at least as good code

A machine might only have arithmetic shift and have to bend over
backwards to handle the unsigned case.

-Larry Jones

What a stupid world. -- Calvin

Dave Hansen

unread,

Jul 30, 1999, 3:00:00 AM7/30/99

to

I should probably change the subject line, since I'm drifting a bit
from the topic...

On Tue, 27 Jul 1999 22:53:55 GMT, "Robert O'Dowd"
<nos...@nonexistant.com> wrote:

>Keith Thompson wrote:
[...]

>> Any non-broken compiler should generate identical (or at least
>> equvalent) code for "i++;" and "++i;", at least if optimization is
>> turned on. If yours doesn't, and you're at all concerned about
>> performance, complain bitterly to your vendor and/or switch compilers.
>

>This is a "quality of implementation" issue. I agree, a decent compiler
>will do this, if optimisation is turned on. Coding ++i hurts nothing,
>and may give you that bit more if your compiler does not.

The first C compiler I used (which shall remain nameless to protect
the guilty, but this was back around 1984) expended several pages in
its otherwise excellent manual about how

for (i=0; i<10; i++)

would loop from 0 to 9, while

for (i=0; i<10; ++i)

would loop from 1 to 9. Or maybe it was 1 to 10. I don't know
because I never tried it, and I avoid the second form to this day (old
habits are hard to break I guess...)

Regards,

-=Dave
Just my (10-010) cents
I can barely speak for myself, so I certainly can't speak for B-Tree.
Change is inevitable. Progress is not.

Bernd Strieder

unread,

Jul 30, 1999, 3:00:00 AM7/30/99

to

Hi

Krzysztof Kolmaga wrote:
>
> Daniel R. Lehenbauer (lehe...@rose-hulman.edu) wrote:

> : does anyone know of a really fast way to take integer log2 of integers? i
> : currently have the code implemented something like:
>
> : while( n > 0 ){
> : n >>= 1
> : i++;
> : }
> : return( i );
>
> : which works well, but this is an extremely time critical operation and was
> : wondering if anyone here knew of a faster method (or a better newsgroup to
> : ask this question on).
>

> : thanks!
> : - daniel
>
> : --

> : comp.lang.c.moderated - moderation address: cl...@plethora.net
> --
> comp.lang.c.moderated - moderation address: cl...@plethora.net

Others have outlined a solution which involves floating points, this
will be the fastest on systems with FPU included. If you have a machine
without FPU in hardware you could perhaps try something like this:

int log2n(unsigned int n)
{
int result=0;
unsigned int t;

t = n >> 16;
if( t ) {
n = t;
result += 16;
}
t = n >> 8;
if( t ) {
n = t;
result += 8;
}
t = n >> 4;
if( t ) {
n = t;
result += 4;
}
switch(n) {
case 15:
case 14:
case 13:
case 12:
case 11:
case 10:
case 9:
case 8: result += 4;
break;
case 7:
case 6:
case 5:
case 4: result += 3;
break;
case 3:
case 2: result += 2;
break;
case 1: result += 1;
break;
}

return result - 1;
}

While this solution improves the problem for big n, it is slower for
small n (n<4). But there is room to play. The optimal solution for you
depends highly on what n's you are going to have. Try out and play a
little with it. Perhaps it helps if you replace the switch construct by
your loop, or by another if in the style of those above and your loop
after that.
Perhaps you could enlarge your switch for cases 0 to 255. Who knows how
your compiler optimizes. If it manages to do the switch-dispath in time
O(1) then make the switch as large as reasonable. I didn't have the time
to try it out now, sorry.

Regards,

Bernd Strieder.

Bernd Jendrissek

unread,

Jul 30, 1999, 3:00:00 AM7/30/99

to

Francis Glassborow <fra...@robinton.demon.co.uk> wrote:
> In article <clcm-1999...@plethora.net>, Krzysztof Kolmaga
> <ch...@odin.ugrad.physics.mcgill.ca> writes

>>: does anyone know of a really fast way to take integer log2 of integers? i
>>: currently have the code implemented something like:
>>
>>: while( n > 0 ){
>>: n >>= 1
>>: i++;
>>: }
>>: return( i );
>>
>>: which works well, but this is an extremely time critical operation and was
>>: wondering if anyone here knew of a faster method (or a better newsgroup to
>>: ask this question on).

Let me assume 16-bit ints for brevity:

int log = 0 ;

if( n >>= 8 )
log = 8 ;

log += log_lookup[n] ;

return log ;

Francis Glassborow

unread,

Jul 30, 1999, 3:00:00 AM7/30/99

to

In article <clcm-1999...@plethora.net>, Keith Thompson
<k...@cts.com> writes

>I'd be interested in knowing if there's *any* existing C or C++
>compiler that generates more efficient code for "++i;" than for "i++;"
>(or "i += 1;", or "i = i + 1;"), with or without optimization turned
>on. If optimization is turned off, failing to make this optimization
>is understandable; if it's turned on, generating extra code for "i++;"
>is arguably a bug. Yes, I agree it's a "quality of implementation"
>issue, but a misfeature like this would be serious enough to be called
>a "quality of implementation" bug. IMHO, of course. (Note the
>semicolons in the above expressions; these are all statement
>expressions whose results are ignored.)

However using ++i is a good habit to get into if you intend to use C++
(user defined iterators for example inevitably work more efficiently in
the pre-increment form)

Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Daniel M. Pfeffer

unread,

Aug 2, 1999, 3:00:00 AM8/2/99

to

If you are using a twos-complement machne, a faster method would be
something like this:

#include <limits.h>

unsigned int log2(unsigned int n)
{
int result = 0;

if (n != 0) {
while (n & ~0x0F) {
n >>= 4;
result += 4;
}

while (n & ~0x01) {
n >>= 1;
++result;
}
}

return result;
}

The idea is to adjust the shift values in the first loop so that it is
approx. the square root of the number of bits in an int. For 16-bit
integers, this gives a worst case of 8 shifts (instead of 16), for 32-bit
integers, this gives a worst case of 12 shifts (instead of 32).

Daniel Pfeffer

Krzysztof Kolmaga <ch...@odin.ugrad.physics.mcgill.ca> wrote in message
news:clcm-1999...@plethora.net...

> Sorry about the previous post (I was too quick on the trigger)
>
>
>

> Daniel R. Lehenbauer (lehe...@rose-hulman.edu) wrote:

> : does anyone know of a really fast way to take integer log2 of integers?
i
> : currently have the code implemented something like:
>
> : while( n > 0 ){
> : n >>= 1
> : i++;
> : }
> : return( i );
>
>
> : which works well, but this is an extremely time critical operation and
was
> : wondering if anyone here knew of a faster method (or a better newsgroup
to
> : ask this question on).
>

> Theoretically the quickest way would be to do some sort of binary search
> instead of shifting bit by bit(brute force). Of course in practice it
> might not be worth it and if it is it depends on the size of your integer.
> Anyhow as I thought the problem interesting I wrote the following program,
> I don't know if it's faster though?
>
> #include <stdio.h>
>
> #define RSHIFT( num , bits ) ((num)>>(bits))
>
> typedef unsigned int uint_t ;
>
> int maxshift = sizeof(uint_t) ;
>
> uint_t log2( uint_t number )
> {
> int i , diff ;
> if ( number == 0 )
> {
> fprintf( stderr , "You cannot take a log of 0!\n" ) ;
> exit(-1) ;
> }
> i = diff = maxshift/2 ;
> while( diff > 0 )
> {

> diff = diff / 2 ;

> if ( RSHIFT( number , i) == (uint_t)0 )
> i -= diff ;
> else
> i+=diff ;
> }
> return i ;
> }
>
> main( int argc , char *argv[] )
> {
> int n ;
> n = atoi(argv[1] ) ;
> printf( "log2(%d)=%d\n" , n , log2(n) ) ;
> }
>

> I think you could speed it up even more by replacing :
> diff = diff / 2 ; with diff>>=2 ;
>

> I hope it helps.
>
> Chris Kolmaga

Daniel M. Pfeffer

unread,

Aug 3, 1999, 3:00:00 AM8/3/99

to

The fastest code (32-bit machines) would look like this, with obvious
changes for other machines:

int log2(unsigned int n)
{
if (n != 0) {
int result = 0;

if (n & 0xffff0000 != 0) {
n >>= 16;
result += 16;
}

if (n & 0xff00 != 0) {
n >>= 8;
result += 8;
}

if (n & 0xf0 != 0) {

n >>= 4;
result += 4;
}

if (n & 0xc != 0) {
n >>= 2;
result += 2;
}

if (n & 0x2 != 0) {
n >>= 1;
result += 1;
}

result += n;
return result;
}
else return -1;
}

You could speed things up even further by using a precomputed table for the
last few comparisons (e.g. when n is guaranteed to be less than 256).

Daniel Pfeffer

Francis Glassborow

unread,

Aug 3, 1999, 3:00:00 AM8/3/99

to

In article <clcm-1999...@plethora.net>, Daniel M. Pfeffer
<p.f.e.f...@internet-zahav.net> writes

>The idea is to adjust the shift values in the first loop so that it is
>approx. the square root of the number of bits in an int. For 16-bit
>integers, this gives a worst case of 8 shifts (instead of 16), for 32-bit
>integers, this gives a worst case of 12 shifts (instead of 32).

There many other factors that have to be taken into account. Code size
effects the probability of cache misses and page faults. Conditionals
can have adverse effects on pipelined ALUs. In addition, is it the mean
performance that is important or the worst case? I do not believe that
there is any single algorithm that will be best on all platform/compiler
combinations. For example, if your hardware has a specific test for
negative integers and mean performance is wanted somethiong like this
might work well.

int result=32;
while (item>=0){
item << 1;
--result;
}

But without such a specific test I would not expect particularly good
performance.

Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Daniel M. Pfeffer

unread,

Aug 3, 1999, 3:00:00 AM8/3/99

to

I stipulate that code size can affect the cache hit rate. My code is only
twice the size of the simple loop, and achieves significant performance
gains (O(sqrt(log2(n)) instead of O(log2(n))). I also provided an even
faster version (O(log2(log2(INT_MAX))), but at the cost of larger code size.

The OP requested the fastest solution possible, implying that the call to
log2() was in an innermost loop. As such, it is likely that a doubling (or
more) of code size for a not very large function will be worth the possible
flushing from the cache of other (less used) code. I did not mean to imply
that the code should not be profiled in the production environment.

Daniel Pfeffer.

Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message
news:clcm-1999...@plethora.net...

Mathew Hendry

unread,

Aug 3, 1999, 3:00:00 AM8/3/99

to

[posted and mailed]

"Daniel M. Pfeffer" <p.f.e.f...@internet-zahav.net> wrote:

: The fastest code (32-bit machines) would look like this

That's a rather bold statement. :) The speed of a particular algorithm
depends on many factors. On my PIII-500 system, the following is the
fastest with both Cygwin gcc 2.95 and MSVC 6.0

int
hi_bit39(unsigned long n){
return n>=1<<16?n>=1<<24?n>=1<<28?n>=1<<30?n>=1<<31?31:30:n>=1<<29
?29:28:n>=1<<26?n>=1<<27?27:26:n>=1<<25?25:24:n>=1<<20?n>=1<<22?
n>=1<<23?23:22:n>=1<<21?21:20:n>=1<<18?n>=1<<19?19:18:n>=1<<17?
17:16:n>=1<<8?n>=1<<12?n>=1<<14?n>=1<<15?15:14:n>=1<<13?13:12:n
>=1<<10?n>=1<<11?11:10:n>=1<<9?9:8:n>=1<<4?n>=1<<6?n>=1<<7?7:6:n
>=1<<5?5:4:n>=1<<2?n>=1<<3?3:2:n>=1<<1?1:n>=1<<0?0:-1;
}

which is a fully unrolled binary search. 5 comparisons and not much
else.

--

Mathew Hendry, Programmer, Visual Sciences Ltd.
Work <ma...@vissci.com>, Home <sca...@dial.pipex.com>
IRC/ICQ/Q2/... "Scampi", ICQ 24839718
PGP B3C1 1EBC 2BAE 93F0 54C1 E0E7 5FA3 5988 BBD6 B689

Francis Glassborow

unread,

Aug 3, 1999, 3:00:00 AM8/3/99

to

In article <clcm-1999...@plethora.net>, Daniel M. Pfeffer
<p.f.e.f...@internet-zahav.net> writes

>The fastest code (32-bit machines) would look like this, with obvious
>changes for other machines:

I am deeply suspicious of this claim. It is important not to confuse
O() measurements with absolute performance. Such metrics are only
relevant where the data size is not known initially. For example the
much decried 'ripple sort' is actually an optimal sort for lists that
are almost always already fully sorted. The magnitude of the almost
depends on the proportion that are not and the degree to which they are
not sorted. For example if you knew you were always dealing with lists
that were already sorted except for cases that had adjacent pairs
inverted a ripple sort could not be worse than any alternative. In
general ripple sorts perform pretty well on lists of less than ten items
even though the have a very bad O() metric.

Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Dann Corbit

unread,

Aug 3, 1999, 3:00:00 AM8/3/99

to

Daniel M. Pfeffer <p.f.e.f...@internet-zahav.net> wrote in message
news:clcm-1999...@plethora.net...

> I stipulate that code size can affect the cache hit rate. My code is only
> twice the size of the simple loop, and achieves significant performance
> gains (O(sqrt(log2(n)) instead of O(log2(n))). I also provided an even
> faster version (O(log2(log2(INT_MAX))), but at the cost of larger code
size.
>
> The OP requested the fastest solution possible, implying that the call to
> log2() was in an innermost loop. As such, it is likely that a doubling (or
> more) of code size for a not very large function will be worth the
possible
> flushing from the cache of other (less used) code. I did not mean to imply
> that the code should not be profiled in the production environment.

You make an interesting point, that in the absense of requirements, even
"fastest" has no real meaning. For instance, I could allocate an array of 4
gigabytes and store the answers in the array, one count for each of the 4
billion+ characters. If I had a machine with ultra-fast memory and many
gigabytes of it, then perhaps that would be the 'fastest' implementation.
However, I doubt if it would really be useful to anyone.

Also, what is fastest on machine x may not be fastest on machine y. Or even
changing from one compiler to another on the same machine may change what is
the fastest algorithm. So when looking for something of this nature, it
would be best to state clearly what limitations are in terms of memory and
architecture, as well as what the actual time requirement is.

If it is simply a contest to find the algorithm with the lowest O(f(n))
average case behavior, then I specify my ridiculous algorithm as my entry.
Completely useless but it is hard to beat a single lookup operation on an
abstract machine.
[snip]
--
C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
"The C-FAQ Book" ISBN 0-201-84519-9
C.A.P. Newsgroup http://www.dejanews.com/~c_a_p
C.A.P. FAQ: ftp://38.168.214.175/pub/Chess%20Analysis%20Project%20FAQ.htm

Gene Wirchenko

unread,

Aug 4, 1999, 3:00:00 AM8/4/99

to

Francis Glassborow <fra...@robinton.demon.co.uk> wrote:

>In article <clcm-1999...@plethora.net>, Daniel M. Pfeffer
><p.f.e.f...@internet-zahav.net> writes
>>The fastest code (32-bit machines) would look like this, with obvious
>>changes for other machines:
>
>I am deeply suspicious of this claim. It is important not to confuse
>O() measurements with absolute performance. Such metrics are only
>relevant where the data size is not known initially. For example the
>much decried 'ripple sort' is actually an optimal sort for lists that

^^^^^^^^^^^
Is this another term for "bubble sort"?

>are almost always already fully sorted. The magnitude of the almost
>depends on the proportion that are not and the degree to which they are
>not sorted. For example if you knew you were always dealing with lists
>that were already sorted except for cases that had adjacent pairs
>inverted a ripple sort could not be worse than any alternative. In
>general ripple sorts perform pretty well on lists of less than ten items
>even though the have a very bad O() metric.

Uh, what RW sort DOESN'T perform at least pretty well on lists of
less than ten items? ("RW" because bogosort isn't a RW sort.)

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

cbfal...@my-deja.com

unread,

Aug 4, 1999, 3:00:00 AM8/4/99

to

In article <clcm-1999...@plethora.net>,

Francis Glassborow <fran...@robinton.demon.co.uk> wrote:
> In article <clcm-1999...@plethora.net>, Daniel M. Pfeffer
> <p.f.e.f...@internet-zahav.net> writes
> >The fastest code (32-bit machines) would look like this, with obvious
> >changes for other machines:
>
> I am deeply suspicious of this claim. It is important not to confuse
> O() measurements with absolute performance. Such metrics are only
> relevant where the data size is not known initially. For example the
> much decried 'ripple sort' is actually an optimal sort for lists that

> are almost always already fully sorted. The magnitude of the almost
> depends on the proportion that are not and the degree to which they
are
> not sorted. For example if you knew you were always dealing with
lists
> that were already sorted except for cases that had adjacent pairs
> inverted a ripple sort could not be worse than any alternative. In
> general ripple sorts perform pretty well on lists of less than ten
items
> even though the have a very bad O() metric.
>

> Francis Glassborow Journal Editor, Association of C & C++ Users
> 64 Southfield Rd
> Oxford OX4 1PA +44(0)1865 246490
> All opinions are mine and do not represent those of any organisation

> --
> comp.lang.c.moderated - moderation address: cl...@plethora.net
>

Look up a book by Hastings - I think it is called Approximations for
Digital Computers, and it is VERY OLD. Unfortunately someone stole my
copy about 20 years ago.

This covers use of rational polynomials for fast approximations to most
math functions, and leveling the error over a range. For example, log2
will require a function to create the log in the range 1 to 2 only,
anything else is handled easily from that. The rational polynomials can
be something like:

a0 + a1.x + a2.x^2 ... / b0 + b1.x + b2 . x^2 ..

taken to the appropriate length for the accuracy needed. The result can
be very fast, and done with integer operations only.

--
Chuck Falconer
(cbfal...@my-deja.com)

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.

Francis Glassborow

unread,

Aug 5, 1999, 3:00:00 AM8/5/99

to

In article <clcm-1999...@plethora.net>, Gene Wirchenko
<ge...@shuswap.net> writes

>>much decried 'ripple sort' is actually an optimal sort for lists that

> ^^^^^^^^^^^
> Is this another term for "bubble sort"?

yes
>
<snipped>

> Uh, what RW sort DOESN'T perform at least pretty well on lists of
>less than ten items? ("RW" because bogosort isn't a RW sort.)

If you have a million such lists to sort, you care about absolute timing
and not O(). The start-up overheads etc. are killers for short lists.
And the cost of writing kills things such as insertion sorts. Oh, and
if you are processing large scale object based graphics where you need
to determine which object is nearest the front for each pixel (I am not
talking about simple planar objects) you get exactly this kind of
problem. The as you move from pixel to pixel there is usually no change
in the object ordering, but you still have to check. The first pass of
a ripple/bubble sort meats the minimum requirement for such a check (n-1
comparisons for n objects) with no other overheads.

David Given

unread,

Aug 6, 1999, 3:00:00 AM8/6/99

to

In article <clcm-1999...@plethora.net>,
ge...@shuswap.net (Gene Wirchenko) writes:
[...]

>>I am deeply suspicious of this claim. It is important not to confuse
>>O() measurements with absolute performance. Such metrics are only
>>relevant where the data size is not known initially. For example the

>>much decried 'ripple sort' is actually an optimal sort for lists that
> ^^^^^^^^^^^
> Is this another term for "bubble sort"?

Nearly. A bubble sort scans through the dataset in one direction only,
swapping out-of order entries. A ripple sort alternates between forwards
and backwards. Any one out-of-order entry can be placed in one (double)
pass.

(My father was one of the five million people who independently invented
this, and dubbed it a `double-bubble sort', which I reckon sounds better.)

--
+- David Given ---------------McQ-+ According to the latest official figures,
| Work: d...@tao-group.com | 43% of all statistics are totally
| Play: dgi...@iname.com | worthless.
+- http://wired.st-and.ac.uk/~dg -+

Keith Thompson

unread,

Aug 6, 1999, 3:00:00 AM8/6/99

to

ma...@vissci.com (Mathew Hendry) writes:
> That's a rather bold statement. :) The speed of a particular algorithm
> depends on many factors. On my PIII-500 system, the following is the
> fastest with both Cygwin gcc 2.95 and MSVC 6.0
>

[code deleted]

>
> which is a fully unrolled binary search. 5 comparisons and not much
> else.

Impressive, but it may or may not be optimal depending on the
distribution of the input values. If, for example, you happen to know
that most arguments will be < 64, you'll want to do the comparisons in
a different order.

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
One of the great tragedies of ancient history is that Helen of Troy
lived before the invention of the champagne bottle.