disp('The following execution times should be comparable:')
N = 500e2;
x1 = round(100*rand(1,N));
x2 = round(100*rand(1,N));
m1 = randn(1,N);
tic
for i = 1:1e4
y = x1 + x2;
end
toc
tic
for i = 1:1e4
y = pow2(m1,x1);
end
toc
results:
"
The following execution times should be comparable:
Elapsed time is 0.679610 seconds.
Elapsed time is 4.879041 seconds.
"
I don't understand your goal at all. But let's discuss your code for a moment. In the x1 + x2 calculation, all of the work is programmed at the microprocessor chip level so it is very fast. For the pow2 operation, at least some of the work is likely compiled C code as opposed to a single chip level available function, so it is not surprising that it runs slower than an addition. But this is not really the correct comparison anyway. The correct comparison is to a multiply by a power of 2, not to an addition. e.g.,
>> m1 = rand(2000);
>> x1 = floor(rand(2000)*50);
>> tic;m1.*(2.^x1);toc
Elapsed time is 2.124031 seconds.
>> tic;pow2(m1,x1);toc
Elapsed time is 0.415166 seconds.
>> isequal(m1.*(2.^x1),pow2(m1,x1))
ans =
1
What is your goal with this?
James Tursa
In the x1 + x2 calculation, all of the work is programmed at the microprocessor chip level so it is very fast. For the pow2 operation, at least some of the work is likely compiled C code as opposed to a single chip level available function, so it is not surprising that it runs slower than an addition.
================================
I don't really follow that. Isn't pow2() a simple barrel shift of the exponent? Why should this require any more C code than summation.
Also, the operation plus(x1,x2) must involve some C interface. It must run a different routine for scalar-vector addition as opposed to vector-vector addition.
Concerning your example of comparison m*2^e with pow2(m,e) - I agree, it's faster, but still not fast enough, taking into account, again, the comparable speed of calculating m*exp(e) to pow2(m,e). For the explicit 2^e command, I believe, the general function ^ is used, which must be slow.
So, sorry, but I'm still confused...
For a pow2(x,y), where both x and y are double class, first the y has to be turned into an integer of the appropriate size. So I would imagine that there are checks involved to ensure that it really is an integer, is not too large, etc. (It appears MATLAB just truncates the y values with no warning). I would guess that some of this is compiled C code rather than chip level code. After getting the integer, then yes, a simple shift and integer add completes the job. But again there have to be checks for result being too large (or small) to overflow the exponent bit field width. All of this takes extra code (compiled C code I am guessing), as opposed to a simple addition where everything is at the chip level.
>
> Also, the operation plus(x1,x2) must involve some C interface. It must run a different routine for scalar-vector addition as opposed to vector-vector addition.
Well, both of the operations of course have some interface to load up values into registers, invoke op codes, etc. I have no idea how much this adds to each operation.
James Tursa
But what basic computation are you really after? Are you trying to do x1 + x2 faster than the basic addition would do it by using fancy exponent calculations, or what? You seem to keep coming back to comparison to the addition operation. (Which, again, I would fail to see the relevance of trying to use pow2 for this per my example).
James Tursa
I agree, I didn't really explain it. I need to be able to work with numbers, which can be as small as 10^-1000, which is impossible if you record it with a double variable. For that, I created a class, where the values are recorded as m*exp(e), with 0<=m<e. It's not a problem to multiple the numbers, as you don't have to calculate the exponent for that (=m1*m2*exp(e1+e2)), but it's necessary to convert the to doubles to sum the numbers up (taking the greater exponent out of the brackets to avoid getting 0s as a result). The drawback of using this approach lies in slow calculation of the exponent (even though in a particular problem I developed the approach for, it's inevitable in any way..). I thought taking the base=2 would help the matter, and was surprised that, at least in Matlab, it's not the case... That's the story.
Ah, OK. At least the motivation makes sense now.
James Tursa
vectorized float addition ===> Elapsed time is 0.020498 seconds.
vectorized pow2 w/integer ===> Elapsed time is 0.015135 seconds.
vectorized x*2^m ===> Elapsed time is 0.091893 seconds.
vectorized pow2(log2(x)+m) ===> Elapsed time is 0.221620 seconds.
===================================
looped float addition ===> Elapsed time is 0.082081 seconds.
looped pow2 w/integer ===> Elapsed time is 0.565976 seconds.
looped x*2^m ===> Elapsed time is 0.169222 seconds.
looped pow2(log2(x)+m) ===> Elapsed time is 0.335277 seconds.
and the program, generating them is here:
x=1./rand(1,1e6);
a=floor(log2(x));
fprintf('\n\n\n');
m=floor(1./rand(size(x)));
b=zeros(size(x));
fprintf('vectorized float addition ===> ');
tic
b=a+m;
toc
fprintf('vectorized pow2 w/integer ===> ');
tic
b=pow2(x,m);
toc
fprintf('vectorized x*2^m ===> ');
tic
b=x.*2.^m;
toc
fprintf('vectorized pow2(log2(x)+m) ===> ');
tic
b=pow2(log2(x)+m);
toc
fprintf('===================================\n');
m=floor(1./rand(size(x)));
b=zeros(size(x));
fprintf('looped float addition ===> ');
tic
for i=1:length(m) b(i)=a(i)+m(i); end
toc
fprintf('looped pow2 w/integer ===> ');
tic
for i=1:length(m) b(i)=pow2(x(i),m(i)); end
toc
fprintf('looped x*2^m ===> ');
tic
for i=1:length(m) b(i)=x(i).*2.^m(i); end
toc
fprintf('looped pow2(log2(x)+m) ===> ');
tic
for i=1:length(m) b(i)=pow2(log2(x(i))+m(i)); end
toc
Thanks everybody for advices!