While debugging an application made by someone, I've come accross a
confusing integer calculation. The application is a disk defragmenter
with a disk cluster map display shown as rows and columns. The
application displays a fixed 8x8 pixel blocks with one pixel as the
separator between each blocks. The blocks always fills the entire
available drawing area (with some left over space), so only the number
of rows & column is calculated. The objective of the calculation is to
find out how many blocks can be fitted in the drawing area.
To make things simple, I adjusted the window height so that the number
of columns is only one. The relevant variables/values that are used by
the application is like so:
W = Drawing area width
B = Block width (not sure if it includes the separator)
C = Number of blocks
Obviously, I will have this formula:
C = W div B
However, the application uses confusing calculation:
mov eax, 38E38E39h ;edx:eax = 38E38E39h (hard coded)
xor edx, edx
mov ecx, W ;ecx = W
imul ecx
sar edx, 1
mov C, edx ;C = edx
So the question is, what kind of calculation is that? I've never seen
something like this. A division calculation using a multiplication. What
I can understand is that it's a compiler optimization because
multiplication instructions are relatively faster than division
instructions. But the method/trick of accomplishing that is new and
unknown to me. Does anyone know something like this?
Thank you in advance.
Regards,
Jaelani
Sure!
I once helped write a paper about it, together with Agner Fog (do a
search!).
The key here is that the compiler could figure out that B was a
constant, in which case _all_ modern compilers will replace division by
a reciprocal multiplication followed by a shift.
It looks like the divisor could have been around 10?
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
> So the question is, what kind of calculation is that? I've never seen
> something like this. A division calculation using a multiplication. What
> I can understand is that it's a compiler optimization because
> multiplication instructions are relatively faster than division
> instructions. But the method/trick of accomplishing that is new and
> unknown to me. Does anyone know something like this?
>
> Thank you in advance.
>
> Regards,
> Jaelani
Chapter 8 of the AMD Optimization Guide for x86 does a pretty good
job:
http://support.amd.com/us/Processor_TechDocs/22007.pdf
And it will help you better understand, where the constant is coming
from
in the code you provided.
Brian
It includes the separator, so B=9.
> C = Number of blocks
> Obviously, I will have this formula:
> C = W div B
>
That is not obvious. Division is slow. Multiplication by the
reciprocal is faster:
C = W * (1/B)
assuming that 1/B is known at compile-time.
>
> However, the application uses confusing calculation:
>
> mov eax, 38E38E39h ;edx:eax = 38E38E39h (hard coded)
> xor edx, edx
> mov ecx, W ;ecx = W
> imul ecx
> sar edx, 1
> mov C, edx ;C = edx
> So the question is, what kind of calculation is that?
As said, the reciprocal is 1/9. But that would be truncated to zero
using integer arithmetic. Instead use:
C = W * (2^n/B)
---------
2^n
for some convenient value of n. n=32 is very convenient but may not
be accurate enough. This program fragment uses n=33.
2^33/9 = 38E38E38
with a fraction of .38E38E3838E38E3838E38E38....et cetera.
The division by 2^33 is carried out in two steps:
1) discard the low 32 bits of W * (2^n/B). In other words use DX
which contains the upper half of the result, ignoring AX which
contains the lowest 32 bits.
2) Shift right to conclude division by 2^33.
Division is the same as multplying by the reciprocal.
38E38E39h is (2^33)/9. SAR corrects for 2^33 vs 2^32.
Hendrik vdH
It's so-called fixed-point arithmetic. I'm pretty sure you can work it
out if you follow the calculations.
An example in decimal... Suppose your CPU registers aren't binary but
decimal and can hold 4 decimal digits.
Suppose you want to calculate 3333 / 10.
Now, what if you represent the 10 as 10000/1000 and calculate 3333 *
1000 / 10000?
Watch it:
mov ax, 3333
mov bx, 1000
mul bx
After this dx:ax = 3333*1000=3333000. But if ax holds 4 decimal digits
as we agreed earlier (3000) and we simply drop them off, we end up
with 333 in dx.
And that's what you wanted: 3333 / 10 = 333.
Now, of course, those aren't decimal registers, they're binary. Do the
math.
Alex
That should be:
> 2^33/9 = 38E38E38
>
> with a fraction of .E38E3838E38E3838E38E38....et cetera.
Rounding 38E38E38.E3 gives 38E38E39
I'm an idiot, so it was the only formula I can think of.
>> So the question is, what kind of calculation is that?
>
> As said, the reciprocal is 1/9. But that would be truncated to zero
> using integer arithmetic. Instead use:
>
>
> C = W * (2^n/B)
> ---------
> 2^n
>
> for some convenient value of n. n=32 is very convenient but may not
> be accurate enough. This program fragment uses n=33.
>
> 2^33/9 = 38E38E38
>
> with a fraction of .38E38E3838E38E3838E38E38....et cetera.
> The division by 2^33 is carried out in two steps:
>
> 1) discard the low 32 bits of W * (2^n/B). In other words use DX
> which contains the upper half of the result, ignoring AX which
> contains the lowest 32 bits.
>
> 2) Shift right to conclude division by 2^33.
The explanation makes more sense. So it's similar to scaling-up the
dividend prior dividing and scaling-down back the result, correct?
For 32-bit integers, does that mean the additional bits (n above 32)
serves as the fractional part? 32 significant bits and 32 fractional
bits perhaps?
Great. This isn't just an answer to my question, but also helps me
understand how to multiply two 32-bit integers using only 16-bit
registers. A question of "how did they come up with that?" which was
long overdue.
Thanks.
Many thanks for that. I later found a similar method in the AMD
documentations, but Agner Fog's document is much easier to understand.
I only sure that the divisor is 9 (8 pixel block & 1 pixel separator),
but I don't know why the application uses 38E38E39h instead of
E38E38E4h. It probably uses the other method described in the AMD
document although the application seems to use the exact same method.
But nevermind about this. I already have my question answered.
Thanks again.
_______
Jaelani
Thanks.
With all the people here bursting with answers makes me feel like a newbie.
These are really good:
http://www.agner.org/optimize/#manuals
Now, I'm learning C++ because of them.
Thanks,
Brian
Yes, it helps. Thank you.
One things though... It's unfortunate that, for Intel CPUs which has a
relatively slower integer division instructions than AMD, don't have
this particular topic in their CPU optimization manual.
Ah! This is easy to answer: It uses the shortest equivalent reciprocal,
i.e. the one where any trailing zero bits have been shifted out:
....E4 is a number which ends in two zero bits, so the only effect these
bits have is to increase the required shift amount by 2, i.e. you would
need
SHR EDX,3
instead of
SHR EDX,1
> document although the application seems to use the exact same method.
> But nevermind about this. I already have my question answered.
Good luck!