Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Assembler in c#

12 views
Skip to first unread message

Clive Tooth

unread,
Mar 17, 2012, 5:03:37 PM3/17/12
to
Here is some code that I would like to speed up...

=================================
unchecked
{
for (int i = startI; i < stopI; i++)
{
uint m = (a[i] << 8) | carry;
carry = m/myBase;
if ((a[i] = m%myBase) == 0)
{
WorkerFoundZero(myNumber, i);
}
} // i loop
} // unchecked
=================================

a[], carry, myBase are uints. It seems that I need to do *two* divide
instructions "m/myBase" and "m%myBase". Of course, the processor has a
DIV instruction that returns both the quotient and the remainder. How
can I get access to both results? Math.DivRem does not seem
particularly fast, also it only accepts int and long operands.

Ideally, I would like to be able to write a couple of lines of
assembler in my c# program.

--
Clive Tooth

Peter Duniho

unread,
Mar 17, 2012, 5:19:20 PM3/17/12
to
C# is good at a lot of different things, but inline assembly isn't one of
them.

That said, you have already done the division. Division is way more
expensive than addition or multiplication. So why not take advantage of
the result you already have:

carry = m / myBase;
remainder = m - carry * myBase;
if ((a[i] = remainder) == 0)
{
...

?

I didn't write a benchmark to actually compare, but it's possible that
would be significantly faster than the two divisions.

Of course, that assumes that the JIT compiler is already not optimizing
your two operations into a single DIV instruction. Unless you've looked at
the generated machine code, you should not assume it's not.

Pete

Clive Tooth

unread,
Mar 17, 2012, 9:27:04 PM3/17/12
to
Wow! That that change virtually doubled the speed of the loop. Thanks,
Pete.

Background: This is my runs-of-zeros-in-a-power-of-two program.
I am searching for a run of 18 consecutive zeros. I had changed the
program so that it would detect if the current power of two has a run
of at least 13 consecutive zeros. [I changed the internal
represenation from base one billion to base ten million.] If such a
run is present the program simply proceeds to the next power of two by
doubling the current one. If such a run is not present the program
multiplies the current power by 256. The effect of that change was to
make the program about 2.5 times faster. [Reducing the picoseconds per
digit metric to about 36.] For powers of two in the region of 2^(10^9)
it turns out that the program can multiply by 256 about 99.99% of the
time. Your change has further reduced the ps/digit figure to about 19.

--
Clive
0 new messages