power pf two fast computation? pow2

Max

unread,

Jan 3, 2010, 12:18:03 AM1/3/10

to

Hi!
I need to represent a number, say x, as x = m*b^e. It has to be done in such a way that summing up x1+x2 in this representation would run as fast as possible. The slowest part of this calculation is the exponentiation of b. As I understand, the best choice would be for to have b=2 and m to be an integer. In this case, the whole operation m*b^e is simply an addition of e to the floating-point exponent of m. There a function in Matlab which does precisely that: pow2(m,e), but I'm confused with the speed of the calculation. This function appears to be more about 10 times slower than simple summation or two real numbers, which shouldn't be the case, if effectively we only add e to the exponent of m. Would somebody happen to know what's going on, and if there is a better solution to the problem? The program below illustrates the difference in the speed of execution of commands:

disp('The following execution times should be comparable:')
N = 500e2;
x1 = round(100*rand(1,N));
x2 = round(100*rand(1,N));
m1 = randn(1,N);
tic
for i = 1:1e4
y = x1 + x2;
end
toc
tic
for i = 1:1e4
y = pow2(m1,x1);
end
toc

results:
"
The following execution times should be comparable:
Elapsed time is 0.679610 seconds.
Elapsed time is 4.879041 seconds.
"

James Tursa

unread,

Jan 3, 2010, 3:50:05 PM1/3/10

to

"Max " <nikitch...@gmail.com> wrote in message <hhp9ab$f69$1...@fred.mathworks.com>...

I don't understand your goal at all. But let's discuss your code for a moment. In the x1 + x2 calculation, all of the work is programmed at the microprocessor chip level so it is very fast. For the pow2 operation, at least some of the work is likely compiled C code as opposed to a single chip level available function, so it is not surprising that it runs slower than an addition. But this is not really the correct comparison anyway. The correct comparison is to a multiply by a power of 2, not to an addition. e.g.,

>> m1 = rand(2000);
>> x1 = floor(rand(2000)*50);
>> tic;m1.*(2.^x1);toc
Elapsed time is 2.124031 seconds.
>> tic;pow2(m1,x1);toc
Elapsed time is 0.415166 seconds.
>> isequal(m1.*(2.^x1),pow2(m1,x1))
ans =
1

What is your goal with this?

James Tursa

Matt J

unread,

Jan 3, 2010, 8:20:06 PM1/3/10

to

"James Tursa" <aclassyguy_wi...@hotmail.com> wrote in message <hhqvtt$4r1$1...@fred.mathworks.com>...

In the x1 + x2 calculation, all of the work is programmed at the microprocessor chip level so it is very fast. For the pow2 operation, at least some of the work is likely compiled C code as opposed to a single chip level available function, so it is not surprising that it runs slower than an addition.

================================

I don't really follow that. Isn't pow2() a simple barrel shift of the exponent? Why should this require any more C code than summation.

Also, the operation plus(x1,x2) must involve some C interface. It must run a different routine for scalar-vector addition as opposed to vector-vector addition.

Max

unread,

Jan 4, 2010, 12:46:04 AM1/4/10

to

Thank you for the response. My goal is to separate the modulus and the exponent of a number (y = m*b^e), and to do it in such a way, that the massive computation involving these numbers won't be too slow. Due to the expectation, that the barrel shift should be as fast as the summation, and should be realized at the same processing level, as Matt pointed out in his comment, I tried base=2, and was surprised to find out that it's much slower than the summation operation and only marginally faster than the case, when b=e (y=m*exp(e)). I tried, but couldn't find a prove for it to be otherwise. Also, in the comments to the realization of pow2 function in Matlab help, it is said, that for the same reasons I gave here, the operation should be really fast. But as I said, it's not much faster than taking the exponent.

Concerning your example of comparison m*2^e with pow2(m,e) - I agree, it's faster, but still not fast enough, taking into account, again, the comparable speed of calculating m*exp(e) to pow2(m,e). For the explicit 2^e command, I believe, the general function ^ is used, which must be slow.

So, sorry, but I'm still confused...

James Tursa

unread,

Jan 4, 2010, 1:38:03 AM1/4/10

to

"Matt J " <mattja...@THISieee.spam> wrote in message <hhrfo6$j6o$1...@fred.mathworks.com>...

> "James Tursa" <aclassyguy_wi...@hotmail.com> wrote in message <hhqvtt$4r1$1...@fred.mathworks.com>...
>
> In the x1 + x2 calculation, all of the work is programmed at the microprocessor chip level so it is very fast. For the pow2 operation, at least some of the work is likely compiled C code as opposed to a single chip level available function, so it is not surprising that it runs slower than an addition.
> ================================
>
> I don't really follow that. Isn't pow2() a simple barrel shift of the exponent? Why should this require any more C code than summation.

For a pow2(x,y), where both x and y are double class, first the y has to be turned into an integer of the appropriate size. So I would imagine that there are checks involved to ensure that it really is an integer, is not too large, etc. (It appears MATLAB just truncates the y values with no warning). I would guess that some of this is compiled C code rather than chip level code. After getting the integer, then yes, a simple shift and integer add completes the job. But again there have to be checks for result being too large (or small) to overflow the exponent bit field width. All of this takes extra code (compiled C code I am guessing), as opposed to a simple addition where everything is at the chip level.

>
> Also, the operation plus(x1,x2) must involve some C interface. It must run a different routine for scalar-vector addition as opposed to vector-vector addition.

Well, both of the operations of course have some interface to load up values into registers, invoke op codes, etc. I have no idea how much this adds to each operation.

James Tursa

unread,

Jan 4, 2010, 3:18:03 AM1/4/10

to

"Max " <nikitch...@gmail.com> wrote in message <hhrvas$otg$1...@fred.mathworks.com>...
>
> My goal is to separate the modulus and the exponent of a number (y = m*b^e), and to do it in such a way, that the massive computation involving these numbers won't be too slow. (snip)

But what basic computation are you really after? Are you trying to do x1 + x2 faster than the basic addition would do it by using fancy exponent calculations, or what? You seem to keep coming back to comparison to the addition operation. (Which, again, I would fail to see the relevance of trying to use pow2 for this per my example).

James Tursa

Max

unread,

Jan 4, 2010, 3:30:22 PM1/4/10

to

> But what basic computation are you really after? Are you trying to do x1 + x2 faster than the basic addition would do it by using fancy exponent calculations, or what? You seem to keep coming back to comparison to the addition operation. (Which, again, I would fail to see the relevance of trying to use pow2 for this per my example).

I agree, I didn't really explain it. I need to be able to work with numbers, which can be as small as 10^-1000, which is impossible if you record it with a double variable. For that, I created a class, where the values are recorded as m*exp(e), with 0<=m<e. It's not a problem to multiple the numbers, as you don't have to calculate the exponent for that (=m1*m2*exp(e1+e2)), but it's necessary to convert the to doubles to sum the numbers up (taking the greater exponent out of the brackets to avoid getting 0s as a result). The drawback of using this approach lies in slow calculation of the exponent (even though in a particular problem I developed the approach for, it's inevitable in any way..). I thought taking the base=2 would help the matter, and was surprised that, at least in Matlab, it's not the case... That's the story.

James Tursa

unread,

Jan 4, 2010, 4:35:07 PM1/4/10

to

"Max " <nikitch...@gmail.com> wrote in message <hhtj4u$i42$1...@fred.mathworks.com>...

> > But what basic computation are you really after? Are you trying to do x1 + x2 faster than the basic addition would do it by using fancy exponent calculations, or what? You seem to keep coming back to comparison to the addition operation. (Which, again, I would fail to see the relevance of trying to use pow2 for this per my example).
>
> I agree, I didn't really explain it. I need to be able to work with numbers, which can be as small as 10^-1000, which is impossible if you record it with a double variable. For that, I created a class, where the values are recorded as m*exp(e), with 0<=m<e. It's not a problem to multiple the numbers, as you don't have to calculate the exponent for that (=m1*m2*exp(e1+e2)), but it's necessary to convert the to doubles to sum the numbers up (taking the greater exponent out of the brackets to avoid getting 0s as a result). The drawback of using this approach lies in slow calculation of the exponent (even though in a particular problem I developed the approach for, it's inevitable in any way..). I thought taking the base=2 would help the matter, and was surprised that, at least in Matlab, it's not the case... That's the story.

Ah, OK. At least the motivation makes sense now.

James Tursa

Max

unread,

Jan 12, 2010, 12:20:19 AM1/12/10

to

All right, my friend located the cause for the pow2 apparent slowness. It's due to the loop optimization of Matlab. It doesn't work for the pow2. But if an array is fed to the function, it works even faster that the addition (as expected). Here are the results obtained in Linux (previous print outs were from Windows version):

vectorized float addition ===> Elapsed time is 0.020498 seconds.
vectorized pow2 w/integer ===> Elapsed time is 0.015135 seconds.
vectorized x*2^m ===> Elapsed time is 0.091893 seconds.
vectorized pow2(log2(x)+m) ===> Elapsed time is 0.221620 seconds.
===================================
looped float addition ===> Elapsed time is 0.082081 seconds.
looped pow2 w/integer ===> Elapsed time is 0.565976 seconds.
looped x*2^m ===> Elapsed time is 0.169222 seconds.
looped pow2(log2(x)+m) ===> Elapsed time is 0.335277 seconds.

and the program, generating them is here:
x=1./rand(1,1e6);
a=floor(log2(x));

fprintf('\n\n\n');

m=floor(1./rand(size(x)));
b=zeros(size(x));
fprintf('vectorized float addition ===> ');
tic
b=a+m;
toc

fprintf('vectorized pow2 w/integer ===> ');
tic
b=pow2(x,m);
toc

fprintf('vectorized x*2^m ===> ');
tic
b=x.*2.^m;
toc

fprintf('vectorized pow2(log2(x)+m) ===> ');
tic
b=pow2(log2(x)+m);
toc

fprintf('===================================\n');
m=floor(1./rand(size(x)));
b=zeros(size(x));
fprintf('looped float addition ===> ');
tic
for i=1:length(m) b(i)=a(i)+m(i); end
toc

fprintf('looped pow2 w/integer ===> ');
tic
for i=1:length(m) b(i)=pow2(x(i),m(i)); end
toc

fprintf('looped x*2^m ===> ');
tic
for i=1:length(m) b(i)=x(i).*2.^m(i); end
toc

fprintf('looped pow2(log2(x)+m) ===> ');
tic
for i=1:length(m) b(i)=pow2(log2(x(i))+m(i)); end
toc

Thanks everybody for advices!