The Math::BigInt::FastCalc Prophecies

Tels

unread,

Aug 10, 2002, 6:00:58 PM8/10/02

to perl5-...@perl.org

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

this is not an announcement, more like a prediction. Or a prophecy, if you
will. This project was at the back of my head (and the front, the sides and
in the middle and across) for quite a long while. Now is the time to let it
loose. If you aren't interested in Math::Big*s stuff, please ignore this
scroll. Thank you.

Contents:

0. Revelations
1. What is it?
2. Motivation
3. Advantages/Disadvantages
4. Help Wanted! Baldly or Badly, Dead or Alive!
5. First Trials and Benchmarks
5.1 O(1)
5.2 O(N)
6. Conclusion

So, let's journey onwards!

0: Revelations
==============

|
|. I was standing right by the statue of Ossory, right, when I noticed
Brutha just beside me. Everyone was keeping away from him because of
him beeing a bishop and they do things to you if you jostle bishops.

||
||. I said to him, hello, You Graciousness, and offered him a yoghurt
practically free.

|||
|||. He responded, no.

|\ /
| \/. I said, it's very healthy, it's a *live* yoghurt.

\ /
\/ . He said, yes, he could see.

\ /|
\/ |. He was staring at the doors. This was about the time of the third
gong, right, so we all knew we'd got hours to wait. He was looking
a bit down it's not as if he even et the yoghurt, which I admit was
on the hum a bit, what with the heat. I mean, it was more alive
than usual. I mean, I had to keep hitting it with a spoon to stop
getting it out of the ... all right. I was just explaining about the
yoghurt. All *right*. I mean, you want to put a bit of colour in,
don't you? People like a bit of colour. It was green.

First part of Dhblah's Revelations - Small Gods, T. Pratchett

1: What is it?
==============

FastCalc is a re-implementation of some (probably not all) sub routines in
Calc in XS. It is thus a replacement for Calc, that works 100% and
transparent to the user (except for speed) like Calc. In fact, since you
only use it via Math::BigInt and cohorts, it *will* be transparent to the
user except for the speed difference, which will hopefully not invisible.

The routines will be done by your humble $self in an easiest-first order.

Math::BigInt::GMP, Math::BigInt::Pari and Math::BigInt::BitVect are the
other members of the growing math library replacement family, so FastCalc
will probably fit in nicely.

2: Motivation
=============

* More Speed. More about that in the benchmark section.
* I always wanted to learn XS, but never got round to it.
* I can stop pretending I don't know XS. I'm now sure I don't know it.

3: Advantages/Disadvantages
===========================

At this point many will wonder whether I am sane. Others might question
my coffeine level.

The question is: Do we need another replacement library, when we got
perfectly working GMP, Pari and BitVect?

The answer is, of course: 42 (Read: It depends).

The "other" libraries store their numbers in base 2, while Calc does it in
base 10 (basically, that is).

The difference is that *calculations* ares very fast in Pari/GMP etc. E.g.
2 ** 1234567 basically creates a one with a lot of zeros attached to the
right to it. But anytime you need to look at the number in decimal space
(rounding, extracting digits, printing it, creating a number from a
(decimal) string etc), these libraries take a long time due to the needed
conversion.

This means that Pari et al make *some* things (sometimes much) faster, but
other thins get slower. It all depends on what you prefer.

But the FastCalc code will be always faster or at least have the same speed
than Calc. This means it is a safe replacement, you will never loose
speed. As a trade-off, the speed improvements for large numbers and
complex calculations will not be as high as with the other libraries.

Another disadvantage is that FastCalc will need to be compiled. It shares
this with Pari et al, but my hope is that it will still be much simpler (no
external library to compile first) a

[I am sorry if the mail is a bit garbled from now on, gpg hung when signing
and iw as cut off here, so I am retyping/reassmebling it from my notes]

and can thus be a candidate for CORE inclusion. Maybe.

4: Help Wanted! Baldly or Badly, Dead or Alive!
===============================================

5. First Trials and Benchmarks
==============================

I have done some preliminary benchmarks to see what would improve and how.
A test-version to redo them is at http://bloodgate.com/perl/packages - no
officialCPAN until I incorporate all the CALC stuff.

There are basically four types of routines in Calc:

O(1) constant time needed, like _is_odd(), _is_one(), _length()
O(N) (internal) time depends on input length, but uses no perl constructs
to loop over the input, but some internal function (like
pack, unpack, grep etc). Examples are _new(), _copy()
O(N) perl like O(N), but Perl loops are used to loop over the input,
these are even slower. Examples are add, mod (some cases),
inc (usually it is O(1), though), etc
O(N*N) nested loops over the input, like mul(), div()

There are some other variants, like bpow(), which can't be easily classified
(it depends more on the size of the output, then the size of the input,
because there are few steps, but each steptakes longer and longer)

The more complex the function, and the more Perl statement's it uses, the
more it can be improved, but the harder it will be (XS get's hairy and
complex).

Let's first look at some O(1) function:

5.1 O(1)
========

_is_odd() determines whether a number is odd or not by looking at the last
element. In Calc:

sub _is_odd
{
# return true if arg (BINT or num_str) is even
my $x = $_[1];

(($x->[0] & 1)) <=> 0;
}

Easy, isn't it?

We benchmark it first under Math::BigInt, BigFloat, BigRat and BigLite to
see the status quo (done at 1 Ghz, Athlon TBird, Linux, Gcc 3.1, newest
modules (v1.61,0.08 etc) and Perl v5.8.0):

mbi: 4s (3.13 usr + 0.01 sys = 3.14 CPU) @ 115038/s (n=361222)
mbf: 3s (3.18 usr + 0.00 sys = 3.18 CPU) @ 42600/s (n=135470)
rat: 3s (3.13 usr + 0.00 sys = 3.13 CPU) @ 41884/s (n=131100)
lite: 4s (3.07 usr + 0.00 sys = 3.07 CPU) @ 162686/s (n=499447)

calc: 4s (3.02 usr + 0.02 sys = 3.02 CPU) @ 148545/s (n=445637)

Lite isn't much faster than BigInt, though. From the numbers the alert
reader can deduce that about 55% of the time of Math::BigInt->is_odd() is
thus spent inside Math::BigInt::Calc::_is_odd().

Here is a simple function in XS (my first!):

SV *
is_odd(n)
SV* n

CODE:
RETVAL = newSViv( ((SvIV(n) & 1) == 0) );
OUTPUT:
RETVAL

It is no replacement, but it shows us the maximum speed we could ever
achive:

xsfaster: 2s (3.16 usr + 0.01 sys = 3.17 CPU) @ 1282097/s (n=4064248)

1280/149 is about 8.6 times faster than Calc. Unfortunately, it is a
simplistic version that does not anything a Calc replacement needs to do.
But it shows what the theoretical limit will be, you cannot go faster than
this.

Now to the real thing:

SV *
_is_odd(class, x)
SV* class
SV* x
INIT:
AV* a;
SV* temp;

CODE:
a = (AV*)SvRV(x); /* ref to aray, don't check ref */
temp = *av_fetch(a, 0, 0); /* fetch first element */
RETVAL = newSViv( ((SvIV(temp) & 1) == 0) );
OUTPUT:
RETVAL

xsfast: 3s (3.11 usr + 0.00 sys = 3.11 CPU) @ 204192/s (n=635039)

This is about 40% faster than Calc. This means we can expect to reduce the
55% time spent inside Calc by 40%, which means roughly 15% savings all in
all. Not much, but better than nothing, eh?

To be fair, on a slower 300 Mhz PIII (also with GCC 3.1) the benchmark
indicated that xsfast was about twice as fast as the Calc version,
reducing the time in total by roughly 27% all in all. YMWV (Your Milage
Will Vary).

A full benchmark when FastCalc is done will show us what the real value will
be. But I expect it not to differ that much from the preliminary benchmarks,
since the XS version of _is_odd/_is_even is pretty much in it's final shape.

5.2 O(N) (internal)
===================

Now to something more complex, the _copy() routine. In Calc:

sub _copy
{
[ @{$_[1]} ];
}

In XS I came up with that and I think it is not enough, but I am not sure.

AV *
_copy(class, x)
SV* class
SV* x
INIT:
AV* a_old;
AV* a_new;
SV* temp;
I32 i;
I32 len;

CODE:
a_old = (AV*)SvRV(x); /* ref to aray, don't check ref */
a_new = newAV();
len = av_len(a_old);
av_extend (a_new, len);
for (i = 0; i <= len; i++)
{
av_store (a_new, i, *av_fetch(a_old, i, 0));
}
RETVAL = a_new;
OUTPUT:
RETVAL

And some benchmark script piece (the full script can be found in the package
at bloodgate.com):

for my $n (qw/1 10 100 1000/)
{
my $array = [];
for ($i = 0; $i < $n; $i ++) { push @$array, 1111111; }

print "long ($n):\n";
my $str = '1111111' x $n;
my $xi = Math::BigInt->new($str);
print "len: ",scalar @{$xi->{value}},"\n";
my $xf = Math::BigFloat->new($str);
my $xr = Math::BigRat->new($str);
my $xl = Math::BigInt::Lite->new($str);
timethese ( -3,
{
mbi => sub { $xi->copy(); },
mbf => sub { $xf->copy(); },
rat => sub { $xr->copy(); },
calc => sub { Math::BigInt::Calc->_copy($array); },
calc2 => sub { Math::BigInt::Calc->_copy($xi->{value}); },
xsfast => sub { Math::BigInt::FastCalc->_copy($array); },
xsfast2 => sub { Math::BigInt::FastCalc->_copy($xi->{value}); },
}, );

}

The benchmark values:

long (1):
Benchmark: running calc, calc2, mbf, mbi, rat, xsfast, xsfast2 for at least
3 CPU seconds...
calc: 4s ( 3.10 usr + 0.00 sys = 3.10 CPU) @ 199020/s (n=616963)
calc2: 3s ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 165672/s (n=518556)
mbf: 4s ( 3.18 usr + 0.00 sys = 3.18 CPU) @ 9075/s (n=28861)
mbi: 2s ( 3.18 usr + 0.00 sys = 3.18 CPU) @ 37478/s (n=119181)
rat: 3s ( 3.15 usr + 0.00 sys = 3.15 CPU) @ 9162/s (n=28861)
xsfast: 3s ( 2.97 usr + 0.14 sys = 3.11 CPU) @ 217803/s (n=677368)
xsfast2: 3s ( 3.11 usr + 0.11 sys = 3.22 CPU) @ 191603/s (n=616963)
long (10):
Benchmark: running calc, calc2, mbf, mbi, rat, xsfast, xsfast2 for at least
3 CPU seconds...
calc: 4s ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 152761/s (n=478142)
calc2: 4s ( 3.01 usr + 0.00 sys = 3.01 CPU) @ 82613/s (n=248667)
mbf: 4s ( 3.19 usr + 0.00 sys = 3.19 CPU) @ 8558/s (n=27303)
mbi: 2s ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 30108/s (n=93636)
rat: 3s ( 3.17 usr + 0.00 sys = 3.17 CPU) @ 8569/s (n=27164)
xsfast: 3s ( 2.99 usr + 0.15 sys = 3.14 CPU) @ 185569/s (n=582689)
xsfast2: 4s ( 3.02 usr + 0.15 sys = 3.17 CPU) @ 165430/s (n=524414)
long (100):
Benchmark: running calc, calc2, mbf, mbi, rat, xsfast, xsfast2 for at least
3 CPU seconds...
calc: 2s ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 45126/s (n=142600)
calc2: 3s ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 13610/s (n=42328)
mbf: 3s ( 3.20 usr + 0.01 sys = 3.21 CPU) @ 5495/s (n=17640)
mbi: 3s ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 10370/s (n=32770)
rat: 3s ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 5579/s (n=17630)
xsfast: 5s ( 2.58 usr + 0.45 sys = 3.03 CPU) @ 73900/s (n=223917)

And this time the benchmark was killed by the kernel due to running out of
swapspace. This indicates a memory leak, which leaves me believing (ha!)
that the XS is buggy. But I don't know for sure.

If you run it only for 1000, it runs trough, but if you start with 1, 10,
100 etc it wastes all the memory. Oh well...

Also very strange is that the Calc _copy() takes different times for the
two different array (I confirmed that they contain the same number of
elements). I dunno why. The two xs calls take nearly the same time, and
this is strange. I would expect there to be no difference at all.

If somebody has any ideas (and a deeper understanding of Perl internals),
please stand up, wave your hand and whistle. Thank you in advance.

6. Conclusion
=============

If any body has any comments, please mail me. Don't hesitate, even if you
think it is silly! Thanx for reading one of my boring mails ;)

Best wishes,

Tels

- --
perl -MMath::String -e 'print \
Math::String->from_number("215960156869840440586892398248"),"\n"'

http://bloodgate.com/perl My current Perl projects
PGP key available on http://bloodgate.com/tels.asc or via email.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPVWMb3cLPEOTuEwVAQG3hQf8CB7GrTjENmAfruKOr3zHFnDmTH0Pkpkt
R49lpxb+lr2OoAhlukCX+1IKFuqCKHJnc5EueeK24hGZQvj55Pfmg3ev85Ys2n0G
qbz0yIxueb12XcnREkBXrrG4sKTOi5XccQkCA/CxpaG6UVA3rd07oOL+7RoeZGlE
dXIemg2zR+0XFgaOABTmYGoztsm+RpDHnCx+VFXRYJDDAXyD3DiJxL/87EAczBJs
dM71gYo6phE/jBApbecheJ5rvMjga+qi3smNmQ/j4Xg5Clo005AW4D9iN98R153I
SUehqnRha5UxDWbJ/1xzll1uxZ9WcyQ58MYGp+Xx90+mQs4HKuma7g==
=2xcE
-----END PGP SIGNATURE-----

Tels

unread,

Aug 11, 2002, 4:02:11 AM8/11/02

to Benjamin Goldberg, perl5-...@perl.org

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

On 11-Aug-02 Benjamin Goldberg carved into stone:
> Tels wrote:
> [snip]

>> * I always wanted to learn XS, but never got round to it.
>> * I can stop pretending I don't know XS. I'm now sure I don't know it.
>

> You can always use Inline::C. It does most of the work of XS for you.

The bits around the code in XS that Inline takes care for me - I already
found that out ;) And Inline::C has the big disadvantage, that the user
needs it installed...and if FastCalc should ever be the "official" Calc
replacement, I'd better not bring in more dependecies. :)

There is also the Windows problem: I can get a precompiled module from
ActiveState if it is XS (most of the time). If it depends on Inline::C, I
also need Inline *and* a compiler - which isn't standard on Windows (I
never owned one, for instance).

Best thing is to keep it XS for now.

>> There are some other variants, like bpow(), which can't be easily
>> classified (it depends more on the size of the output, then the size
>> of the input, because there are few steps, but each steptakes longer
>> and longer)
>

> For bpow(), it takes approximatly O(size of result), or O(log(N)*M),
> where N and M are the two inputs.

Sory, I just didn't want to go into this too deeply there :)

And it should be O(N*log(M)), because

bpow(2,1234567)

is log(1234567) steps. But this is still not right, since the
resulting number will grow and grow an multiplying it with itself will take
longer and longer.

te@null:~> time perl -Mbignum -e '2 ** 1234'

real 0m0.188s
user 0m0.170s
sys 0m0.020s
te@null:~> time perl -Mbignum -e '2 ** 12345'

real 0m0.507s
user 0m0.500s
sys 0m0.010s
te@null:~> time perl -Mbignum -e '2 ** 123456'

real 0m35.854s
user 0m35.530s
sys 0m0.120s

Making M ten times bigger and it takes 70 times longer. Make it ten times
longer and it takes to long for me to wait. At least in Calc it is
something different.

> This is of course for N and M positive integers -- I'm not sure how it
> all works out when M is negative, or when either are fractions.

It takes way longer :)

> [snip]

>> SV *
>> _is_odd(class, x)
>> SV* class
>> SV* x
>> INIT:
>> AV* a;
>> SV* temp;
>>
>> CODE:
>> a = (AV*)SvRV(x); /* ref to aray, don't check ref */
>> temp = *av_fetch(a, 0, 0); /* fetch first element */
>> RETVAL = newSViv( ((SvIV(temp) & 1) == 0) );
>> OUTPUT:
>> RETVAL
>

> No need to store temp variables, you know:

>
> SV *
> _is_odd(class, x)
> SV* class
> SV* x

> CODE:
> RETVAL = newSViv(((SvIV(
> (av_fetch( (AV*)SvRV(x) , 0, 0))[0]

> ) & 1) == 0) );
> OUTPUT:
> RETVAL

> [untested]
>
> Of course, it's possible that at the C level, these two will be the
> same.

I did of course test this, and C level was different, but benchmark was the
same. Compiler should know how to optimize this away.

The only "problem" I found in the C code is:

SV* n = ST(0);
SV * RETVAL;
#line 15 "FastCalc.xs"

RETVAL = newSViv( ((SvIV(n) & 1) == 0) );

#line 28 "FastCalc.c"
ST(0) = RETVAL;
sv_2mortal(ST(0));

Looks like RETVAL is a needless temp variable. No idea how to remove it
except manually editing the C and no idea whether it will change anything
or not.

> [snip]

>>From perldoc perlguts:

I read that. But I didn't understand it fully, there were no examples and
especially it doesn't explain what the implications are.

I am dimly aware of refcounts, but I didn't find any reference on what to
do with them (you can dec/inc them, but why should you?).

Some example would have been really helpfull here. If I ever figure it out,
I'll send a patch. ;)

[snip]

> [snip]

>> And this time the benchmark was killed by the kernel due to running
>> out of swapspace. This indicates a memory leak, which leaves me
>> believing (ha!) that the XS is buggy. But I don't know for sure.
>

> Yup, you messed up the AV code -- you forgot an SvREFCNT_dec(SV*)
> somewhere.

I would have thought a inc should be it. Two copies, refcount goes one up,
right? Why should I dec it when I copy the SV?

But isn't the fetch/store pair just making a copy of the pointer, anyway? If
I inc the refcount, and then later modify on of the arrays, will the other
stay the same, or will both be changed?

Another question:

new() in Clac does work with unpack. And it leaves string fragments in
the array (input is a string, it chops it up into pieces). Now, when I
benchmark it, I manually created an array containing numbers. Could this
explain the time difference, e.g. it takes different times to copy a string
sv than one with a number in it?

And I think that my XS code never really copies the SV and thus takes
little difference (still doesn't explain why it takes a difference at
all).

Thank you for your time!

Tels

- --
perl -MMath::String -e 'print \
Math::String->from_number("215960156869840440586892398248"),"\n"'

http://bloodgate.com/perl My current Perl projects
PGP key available on http://bloodgate.com/tels.asc or via email.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPVYZ+3cLPEOTuEwVAQE90Af+L77fmNU85rHCl0gUUfa+7Z6s/yA7EKRC
GrvZmO/Y9txMqPDpV3LpdbiziQDisv2yJz4kMBZA1kMJgUvoNacu2B98LocP8TJi
709u13va1Dli2Mr9hjvmMhnXgUCV7Aip4Z20gjvdK9dv8vA1d0b3VHUXXdF521uE
/I5WNIZ8qe/gp2dnaFNxPev/zblTZzs14/jFw4Q9BPtC+9hd1vdwoAS/xaECzWTC
XZRHetT4rYlaEpeHgu5wBeV8IPU8P3rAUGtmPIDvDtSuhC8ogFSn+eiN2ifvgMX1
N6pE323f0zHGKWWnk47tymKsqNaN1w2NdfKUyTTK/5K+EBFFBH26Gw==
=dlyZ
-----END PGP SIGNATURE-----

Nicholas Clark

unread,

Aug 11, 2002, 5:53:38 AM8/11/02

to Tels, Benjamin Goldberg, perl5-...@perl.org

On Sun, Aug 11, 2002 at 10:02:11AM +0200, Tels wrote:
> Sory, I just didn't want to go into this too deeply there :)
>
> And it should be O(N*log(M)), because
>
> bpow(2,1234567)
>
> is log(1234567) steps. But this is still not right, since the

and I am told that the algorithm is in Knuth.
(and I confess I've not read Knuth's books, but I suspect the first stage
to curing my ignorance of Knuth is admitting it)

> resulting number will grow and grow an multiplying it with itself will take
> longer and longer.
>
> te@null:~> time perl -Mbignum -e '2 ** 1234'
>
> real 0m0.188s
> user 0m0.170s
> sys 0m0.020s
> te@null:~> time perl -Mbignum -e '2 ** 12345'
>
> real 0m0.507s
> user 0m0.500s
> sys 0m0.010s
> te@null:~> time perl -Mbignum -e '2 ** 123456'
>
> real 0m35.854s
> user 0m35.530s
> sys 0m0.120s
>
> Making M ten times bigger and it takes 70 times longer. Make it ten times
> longer and it takes to long for me to wait. At least in Calc it is
> something different.

I was told (by someone at the German Perl Workshop, and I forget who, sorry)
that there is an algorithm (in Knuth) for evaluating (a+b) * (c+d) in 3
multiplies, not 4, for whenever multiplication is considerably slower than
addition. (Well, strictly it's just algebra and a clever way of rewriting
a*c + a*d + b*c + b*d but if you're not using it yet, it could be useful.
I see a couple of comments about "from Knuth..." in BigInt::Calc but they
don't seem to be this one)

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

Tels

unread,

Aug 11, 2002, 8:39:25 AM8/11/02

to Nicholas Clark, perl5-...@perl.org, Benjamin Goldberg

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

On 11-Aug-02 Nicholas Clark carved into stone:

> On Sun, Aug 11, 2002 at 10:02:11AM +0200, Tels wrote:
>> Sory, I just didn't want to go into this too deeply there :)
>>
>> And it should be O(N*log(M)), because
>>
>> bpow(2,1234567)
>>
>> is log(1234567) steps. But this is still not right, since the
>
> and I am told that the algorithm is in Knuth.

Yeah, I know. I read it there :-P

> (and I confess I've not read Knuth's books, but I suspect the first stage
> to curing my ignorance of Knuth is admitting it)

:-)

Knuth usually only talks about integer/float math, which means he can at
length discuss how long x ** y will take, but it won't apply, since he
bases he analysis on the fact that a multiplication takes always the same
time, which isn't the case for bigint multiplactions.

Yesa, he also writes about bigint multiplications, but I didn't find
anything about bigint powers - or I overlooked it or I forgot that it is
there.

But let's forget this completely, I don't want to discuss bmow() - I
probably know already too much on it, more than I ever wanted to know.

I want to know about XS. :-)

[snip]

> I was told (by someone at the German Perl Workshop, and I forget who,
> sorry)

Not me, since we didn't met there. :-)

> that there is an algorithm (in Knuth) for evaluating (a+b) *
(c+d) in 3
> multiplies, not 4, for whenever multiplication is considerably slower
> than
> addition. (Well, strictly it's just algebra and a clever way of rewriting
> a*c + a*d + b*c + b*d but if you're not using it yet, it could be useful.
> I see a couple of comments about "from Knuth..." in BigInt::Calc but they
> don't seem to be this one)

Reason is that calc doe never (a+b) * (c+d), at least not that I am
aware...and I should, since I wrote most of it... :-P

Cheers,

Tels

- --
perl -MDev::Bollocks -e'print Dev::Bollocks->rand(),"\n"'
carefully aggregate sexy content

http://bloodgate.com/perl My current Perl projects
PGP key available on http://bloodgate.com/tels.asc or via email

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPVZa9HcLPEOTuEwVAQHGkgf/SzjgA+Xjubvnq9ypJs0vDHSLliVe6blg
OArIiMcFFdKIJl7cRqMZpPPxGnXwS8EF7b6rQEQADY7u5tCb2jgCyKpcbiQLrvjG
QpqqWMRToqJzjXrYk2CSlK97ssgxzoOG5JkhL9kA6wtofhtAHLgPWQ/RN1xBUFZ2
wlTcCyVkeDH5nhz+ITig7xhl60KxIDg3NY976QkIKg83NnU6NUx7N3hy1/kmh9RJ
l0COL+RrJFx+1WMUEhCHhsnLPNkir+xJOI72+cTSZhCSHgR1wgWoARGxM2ZvvfeC
InLjll6S+wdhU51eLwQ81dp8Ikcia8r1Y0Wh5SRKOq+tuKy7UUZvTQ==
=yi6v
-----END PGP SIGNATURE-----

Nick Ing-Simmons

unread,

Aug 11, 2002, 4:58:05 PM8/11/02

to perl_...@bloodgate.com, perl5-...@perl.org

Tels <perl_...@bloodgate.com> writes:
>* I can stop pretending I don't know XS. I'm now sure I don't know it.

>Here is a simple function in XS (my first!):
>
>SV *
>is_odd(n)
> SV* n
>
> CODE:

> RETVAL = newSViv( ((SvIV(n) & 1) == 0) );

> OUTPUT:
> RETVAL
>
>It is no replacement, but it shows us the maximum speed we could ever
>achive:

No it doesn't :-)

Assuming you want a boolean result there is no need to spend
time creating a newSViv. And we can save a little by returning
the immortals directly:

void
is_odd(n)
SV* n
CODE:
ST(0) = ? ((SvIV(n) & 1) == 0) ? &PL_sv_yes : &PL_sv_no;

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Tels

unread,

Aug 11, 2002, 5:28:45 PM8/11/02

to Nick Ing-Simmons, perl5-...@perl.org

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

On 11-Aug-02 Nick Ing-Simmons carved into stone:

Very cool! Except that it doesn't work:

Please specify prototyping behavior for FastCalc.xs (see perlxs manual)
cc -c -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE
- -D_FILE_OFFSET_BITS=64 -O3 -DVERSION=\"0.02\" -DXS_VERSION=\"0.02\" -fpic
"-I/usr/local/lib/perl5/5.8.0/i686-linux/CORE" FastCalc.c
FastCalc.xs: In function `XS_Math__BigInt__FastCalc_is_odd_fast':
FastCalc.xs:36: parse error before `?'
make: *** [FastCalc.o] Error 1

But the idea is good, that is exactly the response I had in mind :)

I didn't know that returning a boolean result doesn't need a new SV.

Thank you very much,

Tels

- --
perl -MDev::Bollocks -e'print Dev::Bollocks->rand(),"\n"'

efficiently compete open-source materials

http://bloodgate.com/perl My current Perl projects
PGP key available on http://bloodgate.com/tels.asc or via email

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPVbW9XcLPEOTuEwVAQGLUQf9Fx0Qb+Z52qW2pknZHEBYLecFDu++TpGJ
t2MhCEeG2e39yIqnEtkPyBsjuJQ1v8d+Mf06WiL8rR9PTYLv3BU0UIX565qtmiai
NDT3qkzyQWKwDYhIGIYCMX+zNpOATxbvCz6wPPMmF1/J0DCgkhfjjkqVxThp+XNo
OdXMCA5GyqLep5Ie2j1tXl5pw49HeNZcndWwYCe+Ozxd/9jqx0Rbh6XbmyV9fWtz
vFOOSetrknI1Y8IS+kTejMMWEvQwYQQJmxs7nQGsdwlsMvSlhY0yUF46kQ01kuKx
ETd4jTXbtHe8uVULlowb9FrmBK2WRrHV6sXahX62M3ClAcuSDjzvTw==
=QKvL
-----END PGP SIGNATURE-----

Sam Tregar

unread,

Aug 12, 2002, 6:06:24 PM8/12/02

to Tels, perl5-...@perl.org

On Mon, 12 Aug 2002, Tels wrote:

> Btw, why is XS still creating RETVAL in the C code:

Try replacing CODE with PPCODE and add XSRETURN(1) to the end. PPCODE
tells xsubpp that you'll be handling the stack manipulation yourself.

-sam

Tels

unread,

Aug 11, 2002, 6:02:05 PM8/11/02

to Nick Ing-Simmons, perl5-...@perl.org

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

On 11-Aug-02 Nick Ing-Simmons carved into stone:

After a bit of trial and error, this one actually compiles (void isn't in
typemap, and an additional ? crept in):

SV *
is_odd_nick(n)
SV* n

CODE:
ST(0) = ((SvIV(n) & 1) == 0) ? &PL_sv_yes : &PL_sv_no;

And:

Benchmark: running xsfast, xsfaster, xsnick, xsnicker for at least 3 CPU
seconds...
xsfast: 4s (3.11 usr + 0.01 sys = 3.12 CPU) @ 203535/s (n=635030)
xsfaster: 2s (3.09 usr + 0.00 sys = 3.09 CPU) @ 1414767/s (n=4371633)
xsnick: 3s (3.12 usr + 0.01 sys = 3.13 CPU) @ 218229/s (n=683058)
xsnicker: 4s (3.12 usr + 0.00 sys = 3.12 CPU) @ 2137675/s (n=6669547)

Wow ;)

Btw, why is XS still creating RETVAL in the C code:

XS(XS_Math__BigInt__FastCalc_is_odd_nick); /* prototype to pass
- -Wmissing-prototypes */
XS(XS_Math__BigInt__FastCalc_is_odd_nick)
{
dXSARGS;
if (items != 1)
Perl_croak(aTHX_ "Usage: Math::BigInt::FastCalc::is_odd_nick(n)");
{

SV* n = ST(0);
SV * RETVAL;

#line 37 "FastCalc.xs"
ST(0) = ((SvIV(n) & 1) == 0) ? &PL_sv_yes : &PL_sv_no;
#line 64 "FastCalc.c"
}
XSRETURN(1);
}

Couldn't the line "SV * RETVAL;" be dropped?

Best wishes and thanx!

Tels

- --
perl -MDev::Bollocks -e'print Dev::Bollocks->rand(),"\n"'

vitalistically benchmark prospective developments

http://bloodgate.com/perl My current Perl projects
PGP key available on http://bloodgate.com/tels.asc or via email

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPVbe13cLPEOTuEwVAQHi4gf/UWfrV2LOgomwhTHKMJHXq7IWf9AiyTHr
yX5ZGNoK3CeY059P0TJS95VsyG5dUzMTOaXyvDLLMZF4EfzTn7Z344HNAwmO9scz
EYEqbSqLQzckHOyxGlH8NQfE9cx1zqezG8XWdgCctey6fp2qD9xkR6hjrAru+aj8
L2D/seohDVmwi4336jAa5CMSLujZKz7T/GIWOG41sWeCUg0ykcQaDWKN4FgUmhy/
WmP2/kTTYCOH0MmgZ9tzxUHaty7F/Fq0HTngNYgjKO+tAuAXtQgKRzwAXVgvlkkW
1Mm8hjM/ySP5ReaFnJgS4N17FabjVuA/S1quAPCAaHa2zIYbgmDh4g==
=J0hy
-----END PGP SIGNATURE-----

Gisle Aas

unread,

Aug 12, 2002, 4:23:50 AM8/12/02

to Tels, Nick Ing-Simmons, perl5-...@perl.org

Tels <perl_...@bloodgate.com> writes:

> CODE:
> ST(0) = ((SvIV(n) & 1) == 0) ? &PL_sv_yes : &PL_sv_no;

and perl has a macro called boolSV() for this situation:

ST(0) = boolSV((SVIV(n) & 1) == 0);

--
Gisle Aas

Nick Ing-Simmons

unread,

Aug 12, 2002, 5:34:18 AM8/12/02

to ni...@unfortu.net, Tels, perl5-...@perl.org, Benjamin Goldberg

Nicholas Clark <ni...@unfortu.net> writes:
>
>I was told (by someone at the German Perl Workshop, and I forget who, sorry)
>that there is an algorithm (in Knuth) for evaluating (a+b) * (c+d) in 3
>multiplies, not 4, for whenever multiplication is considerably slower than
>addition.

I know my algebra instruction was over two decades ago but

(a+b) * (c+d)

seems to need two adds and ONE multiply.

>(Well, strictly it's just algebra and a clever way of rewriting
>a*c + a*d + b*c + b*d but if you're not using it yet, it could be useful.
>I see a couple of comments about "from Knuth..." in BigInt::Calc but they
>don't seem to be this one)
>
>
>Nicholas Clark
--

Nick Ing-Simmons
http://www.ni-s.u-net.com/

Nicholas Clark

unread,

Aug 12, 2002, 5:43:01 AM8/12/02

to Nick Ing-Simmons, ni...@unfortu.net, Tels, perl5-...@perl.org, Benjamin Goldberg

On Mon, Aug 12, 2002 at 10:34:18AM +0100, Nick Ing-Simmons wrote:
> Nicholas Clark <ni...@unfortu.net> writes:
> >
> >I was told (by someone at the German Perl Workshop, and I forget who, sorry)
> >that there is an algorithm (in Knuth) for evaluating (a+b) * (c+d) in 3
> >multiplies, not 4, for whenever multiplication is considerably slower than
> >addition.
>
> I know my algebra instruction was over two decades ago but
>
> (a+b) * (c+d)
>
> seems to need two adds and ONE multiply.

but if A and B are 128 bits long, and you don't have 128 bit multiply,
you can implement A * B as ((a << 64) + b) * ((c << 64) + d) which is 4
64 bit multiplies and some addition and shifting. I am assuming is the
assumption behind all this is that the addition and shifting can be
programmed easily for arbitrary length, whereas the multiply can't.

And if you don't have 64 bit multiplies, break each into 4 32 bit
multiplies.

My understanding is that the refactoring to three multiplies is useful
for the case of multiply being slower than add or shift, because it reduces
the amount of multiplying.

The context of all this was large (integer) maths. Sorry if it wasn't clear
because the thread has now lost the context.

Nicholas Clark

Nick Ing-Simmons

unread,

Aug 12, 2002, 5:55:58 AM8/12/02

to perl_...@bloodgate.com, Nick Ing-Simmons, perl5-...@perl.org

Tels <perl_...@bloodgate.com> writes:
>-----BEGIN PGP SIGNED MESSAGE-----
>
>Moin,
>
>On 11-Aug-02 Nick Ing-Simmons carved into stone:
>> Tels <perl_...@bloodgate.com> writes:
>>>* I can stop pretending I don't know XS. I'm now sure I don't know it.
>>>Here is a simple function in XS (my first!):
>>>
>>>SV *
>>>is_odd(n)
>>> SV* n
>>>
>>> CODE:
>>> RETVAL = newSViv( ((SvIV(n) & 1) == 0) );
>>> OUTPUT:
>>> RETVAL
>>>
>>>It is no replacement, but it shows us the maximum speed we could ever
>>>achive:
>>
>> No it doesn't :-)
>>
>> Assuming you want a boolean result there is no need to spend
>> time creating a newSViv. And we can save a little by returning
>> the immortals directly:
>>
>> void
>> is_odd(n)
>> SV* n
>> CODE:

>> ST(0) = ? ((SvIV(n) & 1) == 0) ? &PL_sv_yes : &PL_sv_no;
>
>Very cool! Except that it doesn't work:
>
>Please specify prototyping behavior for FastCalc.xs (see perlxs manual)
>cc -c -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE
>- -D_FILE_OFFSET_BITS=64 -O3 -DVERSION=\"0.02\" -DXS_VERSION=\"0.02\" -fpic
>"-I/usr/local/lib/perl5/5.8.0/i686-linux/CORE" FastCalc.c
>FastCalc.xs: In function `XS_Math__BigInt__FastCalc_is_odd_fast':
>FastCalc.xs:36: parse error before `?'

I don't know where 1st '?' came from:

ST(0) = ((SvIV(n) & 1) == 0) ? &PL_sv_yes : &PL_sv_no;

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Nick Ing-Simmons

unread,

Aug 12, 2002, 5:59:07 AM8/12/02

to perl_...@bloodgate.com, Nick Ing-Simmons, perl5-...@perl.org

>After a bit of trial and error, this one actually compiles (void isn't in
>typemap, and an additional ? crept in):

void doesn't need to be in typemap - xsubpp is supposed to
realize that as far as _it_ is concerned sub does not return
anything.

>
>SV *
>is_odd_nick(n)
> SV* n
>
> CODE:

> ST(0) = ((SvIV(n) & 1) == 0) ? &PL_sv_yes : &PL_sv_no;
>
>And:
>
>Benchmark: running xsfast, xsfaster, xsnick, xsnicker for at least 3 CPU
>seconds...
> xsfast: 4s (3.11 usr + 0.01 sys = 3.12 CPU) @ 203535/s (n=635030)
> xsfaster: 2s (3.09 usr + 0.00 sys = 3.09 CPU) @ 1414767/s (n=4371633)
> xsnick: 3s (3.12 usr + 0.01 sys = 3.13 CPU) @ 218229/s (n=683058)
> xsnicker: 4s (3.12 usr + 0.00 sys = 3.12 CPU) @ 2137675/s (n=6669547)
>
>Wow ;)
>
>Btw, why is XS still creating RETVAL in the C code:
>
>XS(XS_Math__BigInt__FastCalc_is_odd_nick); /* prototype to pass
>- -Wmissing-prototypes */
>XS(XS_Math__BigInt__FastCalc_is_odd_nick)
>{
> dXSARGS;
> if (items != 1)
> Perl_croak(aTHX_ "Usage: Math::BigInt::FastCalc::is_odd_nick(n)");
> {
> SV* n = ST(0);
> SV * RETVAL;
>#line 37 "FastCalc.xs"

> ST(0) = ((SvIV(n) & 1) == 0) ? &PL_sv_yes : &PL_sv_no;

>#line 64 "FastCalc.c"
> }
> XSRETURN(1);
>}
>
>Couldn't the line "SV * RETVAL;" be dropped?

It would be it you put the void return back?

>
>Best wishes and thanx!
>
>Tels
>
>- --
> perl -MDev::Bollocks -e'print Dev::Bollocks->rand(),"\n"'
> vitalistically benchmark prospective developments
>
> http://bloodgate.com/perl My current Perl projects
> PGP key available on http://bloodgate.com/tels.asc or via email
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.0.6 (GNU/Linux)
>Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
>
>iQEVAwUBPVbe13cLPEOTuEwVAQHi4gf/UWfrV2LOgomwhTHKMJHXq7IWf9AiyTHr
>yX5ZGNoK3CeY059P0TJS95VsyG5dUzMTOaXyvDLLMZF4EfzTn7Z344HNAwmO9scz
>EYEqbSqLQzckHOyxGlH8NQfE9cx1zqezG8XWdgCctey6fp2qD9xkR6hjrAru+aj8
>L2D/seohDVmwi4336jAa5CMSLujZKz7T/GIWOG41sWeCUg0ykcQaDWKN4FgUmhy/
>WmP2/kTTYCOH0MmgZ9tzxUHaty7F/Fq0HTngNYgjKO+tAuAXtQgKRzwAXVgvlkkW
>1Mm8hjM/ySP5ReaFnJgS4N17FabjVuA/S1quAPCAaHa2zIYbgmDh4g==
>=J0hy
>-----END PGP SIGNATURE-----

Tels

unread,

Aug 13, 2002, 4:17:50 PM8/13/02

to Gisle Aas, Benjamin Goldberg, ni...@unfortu.net, perl5-...@perl.org, Nick Ing-Simmons

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

On 12-Aug-02 Gisle Aas carved into stone:

You mean:

ST(0) = boolSV((SvIV(n) & 1) == 0);

These little typos trip me up a great deal *sigh*

Today I spent 3 hours in vain, and was nearly starting to bite in my table
just because:

SV *
_foo(n)
SV* n
INIT:
AV ax;

CODE:
ax = (AV*) SVRV(n);

does not work. It turns out a name like "ax" isn't liked by the c compiler
(maybe because "ax" is a register name or something predefined?). The error
message "array subscript is not integer" and "wrong argument to binary +"
drove me nuts, because they apparently don't have to do anyhing with the
error. Oh well :)

Anyway, I wish to thank all the people who provided feedback, I just
released the first version.

It turned out that the is_foo() routines are only a few % faster with
FastCalc, since they are O(1) and a lot of the time is spent in
Math::BigInt. Nearly not worth the effort, but everybit counts :)

But the _acmp() routine is much faster (for numbers with more than about
6000 digits it is nearly 9 times faster, for smaller numbers still between
factor 1.7 and 9).

It also turned out that I can optimize the original Calc code for _acmp()
(removing two calls to _len) and that the testsuite of Calc was incomplete.
So the FastCalc project improved Math::BigInt as a side-effect.

Here are some benchmark values:

#!/usr/bin/perl -w

use lib 'blib/lib';
use lib 'blib/arch';

#use lib '../Math-BigInt-1.61/lib';

use Math::BigInt lib => shift;
use Benchmark;

print Math::BigInt->config()->{'lib'}, " => v";
print Math::BigInt->config()->{'lib_version'},"\n";

my $c = 'Math::BigInt';

my $x = $c->new("2");
my $xsmall = $c->new("1234");
my $x2000 = $c->new('1' x 2000);
my $x8000 = $c->new('1' x 8000);
my $x4000 = $c->new('1' x 4000);
my $x1000 = $c->new('1' x 1000);
my $x100 = $c->new('1' x 100);

timethese ( -3,
{
bcmp_small => sub { $x->bcmp($xsmall); },
bcmp_large => sub { $x->bcmp($x1000); },
bcmp_100 => sub { $x100->bcmp($x100); },
bcmp_1000 => sub { $x1000->bcmp($x1000); },
bpow_2000 => sub { $x2000->bcmp($x2000); },
bpow_4000 => sub { $x4000->bcmp($x4000); },
bpow_8000 => sub { $x8000->bcmp($x8000); }
} );

Math::BigInt::FastCalc => v0.04 Factor
bcmp_large: 4s ( 2.98 usr + 0.02 sys = 3.00 CPU) @ 19448/s (n=58346) 1.17
bcmp_small: 4s ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 18441/s (n=57723) 1.7
bcmp_100: 2s ( 3.09 usr + 0.00 sys = 3.09 CPU) @ 17598/s (n=54379) 2.5
bcmp_1000: 4s ( 3.05 usr + 0.00 sys = 3.05 CPU) @ 11524/s (n=35150) 5.3
bpow_2000: 4s ( 3.10 usr + 0.00 sys = 3.10 CPU) @ 8273/s (n=25648) 6.6
bpow_4000: 4s ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 5194/s (n=15584) 7.7
bpow_8000: 4s ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 3070/s (n=9612) 8.9

Math::BigInt::Calc => v0.30
bcmp_large: 4s ( 3.21 usr + 0.00 sys = 3.21 CPU) @ 16482/s (n=52910)
bcmp_small: 4s ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 10569/s (n=31708)
bcmp_100: 4s ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 7087/s (n=21262)
bcmp_1000: 4s ( 3.27 usr + 0.00 sys = 3.27 CPU) @ 2157/s (n=7056)
bpow_2000: 4s ( 3.22 usr + 0.00 sys = 3.22 CPU) @ 1251/s (n=4031)
bpow_4000: 4s ( 3.15 usr + 0.00 sys = 3.15 CPU) @ 669/s (n=2110)
bpow_8000: 4s ( 3.20 usr + 0.00 sys = 3.20 CPU) @ 344/s (n=1102)

Math::BigInt::Calc => v0.31
bcmp_large: 3s ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 16692/s (n=52247)
bcmp_small: 4s ( 3.01 usr + 0.00 sys = 3.01 CPU) @ 11619/s (n=34974)
bcmp_100: 4s ( 3.31 usr + 0.01 sys = 3.32 CPU) @ 8226/s (n=27313)
bcmp_1000: 4s ( 3.17 usr + 0.00 sys = 3.17 CPU) @ 2206/s (n=6996)
bpow_2000: 4s ( 3.23 usr + 0.00 sys = 3.23 CPU) @ 1247/s (n=4031)
bpow_4000: 4s ( 3.25 usr + 0.00 sys = 3.25 CPU) @ 662/s (n=2154)
bpow_8000: 4s ( 3.32 usr + 0.00 sys = 3.32 CPU) @ 342/s (n=1138)

Please note that these are Math::BigInt calls, not Calc/FastCalc, e.g. they
represent what real programs will see as an improvement.

(That bcmp_8000 is sometimes slower in v0.31 is a measurement error, the
values wiggle a bit for each run...)

Thanx again!

Tels

- --
perl -MDev::Bollocks -e'print Dev::Bollocks->rand(),"\n"'

economically supply industry-wide patterns

http://bloodgate.com/perl My current Perl projects
PGP key available on http://bloodgate.com/tels.asc or via email

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPVlpZ3cLPEOTuEwVAQHKJAf/VZV1Wk0uHyW6aE2bGbRoNFC14wJqvll3
zmbTLuGLt2fLcilE1gv4VWffeU4AgJNlXsYCr73bk88nOnfISV49f2sZ0gI2/A18
d71SKrrUl5If6z1S/JDxrkJa66p8X8OZPtRZVKJ80zGl7iXhC7UnUJ41LhifkDG4
cv6zFAhf2oJDP3zilYOeWOzh19X1hfvLbYfk64zaZlzseVKGZqVbUDz/BZQ8wRw4
OB5/RCB23XQRo/j6BmOSfc9SN+6wnV+BQe9Cd9MSbiwZGkLUepoegYSQYrUO7ocv
edWM1gldiDjxnJu4fPCQjqoeVh+2Lla3DbssT05B4RyR8GaOFVw3rA==
=9St3
-----END PGP SIGNATURE-----