Performance tuning

Greg Menke

unread,

Nov 27, 2000, 3:00:00 AM11/27/00

to

For practice I just implemented a md5 hashing routine, it works but is
quite slow. I've added declarations in various spots, marginally
improving performance. So I'm looking for other things to tweak,
particularly in the calculation routines.

I implemented the data structures as integer arrays, which are
accordingly indexed for the math. I'm using aref for this, and so am
wondering how efficient it is- or if its efficiency varies with the
type of sequence.

I did the coding in clisp and the Personal version of Lispworks and am
seeing similar performance, though Lispworks is a good bit faster.

Thanks for any help- and I apologize in advance for any poorly
considered questions.

Gregm

Michael Parker

unread,

Nov 27, 2000, 3:00:00 AM11/27/00

to

In article <m3vgt9m...@mindspring.com>,

We could be a lot more helpful if you posted your code (preferably
nicely formatted, and compilable)

Sent via Deja.com http://www.deja.com/
Before you buy.

Eugene Zaikonnikov

unread,

Nov 27, 2000, 3:00:00 AM11/27/00

to

Hi Greg,

* "Greg" == Greg Menke <gregm-n...@zxy.mindspring.com> writes:

[...]

Greg> I implemented the data structures as integer arrays, which are
Greg> accordingly indexed for the math. I'm using aref for this, and
Greg> so am wondering how efficient it is- or if its efficiency
Greg> varies with the type of sequence.

Well, flattening data abstraction down to integer arrays may not
provide significant performance win by itself. If your numerics is
out of fixnum range, the bottleneck would be at bignum manipulations,
not at the store/fetch operations.
As a compromise, you may consider representing your data via
DEFSTRUCTs with a vector storage type specializer.

Greg> I did the coding in clisp and the Personal version of Lispworks
Greg> and am seeing similar performance, though Lispworks is a good
Greg> bit faster.

Once when bashing fixnums with LW I was surprized by the fact that
simple-vectors/svref worked sufficiently faster than properly
specialized arrays (while it was not the case for floats). There was
also discussion on array performance few months ago, with short
benchmarks for various Lisp implementations.

HTH,

--
Eugene

Erik Naggum

unread,

Dec 4, 2000, 3:00:00 AM12/4/00

to

* Greg Menke <gregm-n...@zxy.mindspring.com>

| For practice I just implemented a md5 hashing routine, it works but is
| quite slow.

To make MD5 perform reasonably fast, you need access to machine
integers and rotate instructions. MD5 looks like it was designed by
someone who wrote in assembler on a machine with more registers than
your average Intel, and it hurts to make that perform in Common Lisp.
I decided to split the integers in half to get them within fixnum
range (the Emacs Lisp implementation has made a similar decision), but
also to write directly into a preallocated bignum rather than an array
because some critical operations were faster that way. (An annoying
"feature" with MD5 is that its operations are all within 32-bit words,
so there's very little to be gained from bignum support.)

#:Erik
--
"When you are having a bad day and it seems like everybody is trying
to piss you off, remember that it takes 42 muscles to produce a
frown, but only 4 muscles to work the trigger of a good sniper rifle."
-- Unknown

Greg Menke

unread,

Dec 4, 2000, 3:00:00 AM12/4/00

to

Erik Naggum <er...@naggum.net> writes:

> * Greg Menke <gregm-n...@zxy.mindspring.com>
> | For practice I just implemented a md5 hashing routine, it works but is
> | quite slow.
>
> To make MD5 perform reasonably fast, you need access to machine
> integers and rotate instructions. MD5 looks like it was designed by
> someone who wrote in assembler on a machine with more registers than
> your average Intel, and it hurts to make that perform in Common Lisp.
> I decided to split the integers in half to get them within fixnum
> range (the Emacs Lisp implementation has made a similar decision), but
> also to write directly into a preallocated bignum rather than an array
> because some critical operations were faster that way. (An annoying
> "feature" with MD5 is that its operations are all within 32-bit words,
> so there's very little to be gained from bignum support.)

My port was a fairly straightforward port of (I believe) canonical C++
code- adapting as appropriate to be more Lisp oriented to the extent I
was able. Moving from an integer array with aref to a simple-array
with svref made no obvious improvement in clisp or Lispworks.

I compiled the core functions in Personal Lispworks using strongest
optimization, which improved runtimes considerably- but it remained
extremely slow; minutes for a largish file compared to about a second
with the C++ version.

I ran it under Lispwork's profiler, which reported the biggest
proportion of time was spent in (+ ... ), followed by the (log???
... ) operations I used to control integer width. The innermost loop
of the md5 algorithm I ported uses + and bitwise logic heavily,
obviously thats where I'm losing.

All that said, my port made no attempts along your lines above, and is
likely naive in a variety of ways. Maybe I'll give something like
what you did a try. This exercise has been quite instructive so far.

Thanks,

Gregm

Erik Naggum

unread,

Dec 4, 2000, 7:59:09 PM12/4/00

to

* Greg Menke <gregm-n...@zxy.mindspring.com>

| I compiled the core functions in Personal Lispworks using strongest
| optimization, which improved runtimes considerably- but it remained
| extremely slow; minutes for a largish file compared to about a second
| with the C++ version.

I experienced similar performance problems with my first few shots.
Today, a 640K file takes 190 ms for the md5sum utility to return, the
new md5 function in Allegro CL takes as long with that file's contents
as a string, but also conses equally much in static memory. My last
version takes 130 ms and conses less than 100 bytes regardless of the
length of the input string. It is not particularly portable. It is
also intended to be used as a sink stream so writing to the stream
computes the md5sum, retrievable upon closing the stream. This is not
quite as fast as I want it to be, but I expect this to change when I
have reimplemented it using the new Allegro CL streams concepts.

Robert Monfera

unread,

Dec 5, 2000, 12:26:24 AM12/5/00

to

Erik Naggum wrote:

> I decided to split the integers in half to get them within fixnum
> range (the Emacs Lisp implementation has made a similar decision), but > also to write directly into a preallocated bignum rather than an array
> because some critical operations were faster that way. (An annoying
> "feature" with MD5 is that its operations are all within 32-bit words,
> so there's very little to be gained from bignum support.)

Did you use a rather low-level direct access to parts of a bignum, or
maybe there are some standard functions that mutate a bignum, avoiding
consing (which I expected + and some other arithmetic operations could
do, but when I looked, they couldn't)?

Robert

Erik Naggum

unread,

Dec 5, 2000, 3:00:00 AM12/5/00

to

* Robert Monfera <mon...@fisec.com>

| Did you use a rather low-level direct access to parts of a bignum, or
| maybe there are some standard functions that mutate a bignum, avoiding
| consing (which I expected + and some other arithmetic operations could
| do, but when I looked, they couldn't)?

Allegro CL offers a function sys:memref that becomes machine code
which simply accesses the memory given as arguments. I used it to get
at the individual bigits. There are no standard functions to mutate
bignums that I know of. Moreover, bignums take up only as many bigits
as they need, so I had to use a "sentinel" bit to get bignums big
enough that I didn't have to test for the proper size of the bignum.
So, yes, this is rather low-level direct access to parts of a bignum.

Marc Battyani

unread,

Dec 5, 2000, 3:00:00 AM12/5/00

to

"Erik Naggum" <er...@naggum.net> wrote in message
news:31850255...@naggum.net...
> * Robert Monfera <mon...@fisec.com>

> | Did you use a rather low-level direct access to parts of a bignum, or
> | maybe there are some standard functions that mutate a bignum, avoiding
> | consing (which I expected + and some other arithmetic operations could
> | do, but when I looked, they couldn't)?
>

> Allegro CL offers a function sys:memref that becomes machine code
> which simply accesses the memory given as arguments. I used it to get
> at the individual bigits. There are no standard functions to mutate
> bignums that I know of. Moreover, bignums take up only as many bigits
> as they need, so I had to use a "sentinel" bit to get bignums big
> enough that I didn't have to test for the proper size of the bignum.

> So, yes, this is rather low-level direct access to parts of a bignum.

Why did you not use LDB to access parts of the bignums ?
Direct memory access must be faster, but are there other reasons?

About SETF LDB the HS says :
"setf may be used with ldb to modify a byte within the integer that is
stored in a given place. The order of evaluation, when an ldb form is
supplied to setf, is exactly left-to-right. The effect is to perform a dpb
operation and then store the result back into the place. "

I wonder why there is no destructive version of SETF LDB that would modify
the given integer?
This would certainly be useful.

Marc

Erik Naggum

unread,

Dec 5, 2000, 3:00:00 AM12/5/00

to

* "Marc Battyani" <Marc.B...@fractalconcept.com>

| Why did you not use LDB to access parts of the bignums?

Mainly because it failed to inline.

| Direct memory access must be faster, but are there other reasons?

Not really. LDB should have been able to do this, but it gets
somewhat specialized before it can be optimized really heavily.

| About SETF LDB the HS says : "setf may be used with ldb to modify a
| byte within the integer that is stored in a given place. The order of
| evaluation, when an ldb form is supplied to setf, is exactly
| left-to-right. The effect is to perform a dpb operation and then store
| the result back into the place. "

Unfortunately, this "place" is not the bignum, but the binding that
holds the bignum. I have never seen a dbp on a bignum return a bignum
eq to the previous value, but I expected just that when I first wanted
to use dpb on bignums.

| I wonder why there is no destructive version of SETF LDB that would modify
| the given integer?
| This would certainly be useful.

I agree that this would be very useful with bignums, but I have not
seen much significant optimization of bignum arithmetic, and I might
understand why: This implementation of MD5 is the first time I have
needed to hack bignums fast. I guess it would help if bignums were
used a lot and that performance matters more to more Common Lisp users
-- optimizing bignum arithmetic is hard work and needs assembly code
to be _really_ efficient.

Gareth McCaughan

unread,

Dec 5, 2000, 7:43:26 PM12/5/00

to

Erik Naggum wrote:

> To make MD5 perform reasonably fast, you need access to machine
> integers and rotate instructions. MD5 looks like it was designed by
> someone who wrote in assembler on a machine with more registers than
> your average Intel, and it hurts to make that perform in Common Lisp.

> I decided to split the integers in half to get them within fixnum
> range (the Emacs Lisp implementation has made a similar decision), but
> also to write directly into a preallocated bignum rather than an array
> because some critical operations were faster that way. (An annoying
> "feature" with MD5 is that its operations are all within 32-bit words,
> so there's very little to be gained from bignum support.)

It would be interesting to know whether it's easier to make
an MD5 perform well in CMU CL, which can do a pretty good job
(better than any other CL, I think) of handling machine-word-sized
integers. Of course, the likely result would be an implementation
that works well under CMU CL and appallingly badly on other
Common Lisps, which might be a pity.

--
Gareth McCaughan Gareth.M...@pobox.com
sig under construc

Erik Naggum

unread,

Dec 5, 2000, 8:57:12 PM12/5/00

to

* Gareth McCaughan

| It would be interesting to know whether it's easier to make an MD5
| perform well in CMU CL, which can do a pretty good job (better than
| any other CL, I think) of handling machine-word-sized integers. Of
| course, the likely result would be an implementation that works well
| under CMU CL and appallingly badly on other Common Lisps, which might
| be a pity.

I don't think that's a pity. As long as we can specify the behavior
such that it can be used the same way in every implementation, I don't
care how implementation-specific the implementation is.

cbbr...@hex.net

unread,

Dec 6, 2000, 12:53:37 AM12/6/00

to

Gareth.M...@pobox.com (Gareth McCaughan) writes:
> Erik Naggum wrote:
>> To make MD5 perform reasonably fast, you need access to machine
>> integers and rotate instructions. MD5 looks like it was designed
>> by someone who wrote in assembler on a machine with more
>> registers than your average Intel, and it hurts to make that
>> perform in Common Lisp. I decided to split the integers in half
>> to get them within fixnum range (the Emacs Lisp implementation
>> has made a similar decision), but also to write directly into a
>> preallocated bignum rather than an array because some critical
>> operations were faster that way. (An annoying "feature" with MD5
>> is that its operations are all within 32-bit words, so there's
>> very little to be gained from bignum support.)

> It would be interesting to know whether it's easier to make an MD5
> perform well in CMU CL, which can do a pretty good job (better than
> any other CL, I think) of handling machine-word-sized integers.

Interesting indeed.

> Of course, the likely result would be an implementation that works
> well under CMU CL and appallingly badly on other Common Lisps, which
> might be a pity.

It's "a pity" to the extent to which life would be simpler if all
systems were implemented in homogeneous ways; unfortunately, that
seems to lead us into things like "Windows Everywhere," which has its
own pitiable aspects.

If ACL offers one approach to make this fast, and CMU CL offers
another, then that offers a third implementer the opportunity to learn
from not merely one implementation approach, but _two_. And perhaps
even produce a system that could allow Even Better Optimization...
--
(concatenate 'string "cbbrowne" "@hex.net") <http://www.ntlug.org/~cbbrowne/>
"90% of the ideas people bring to me are no good. So if I reject all
of them I've got a pretty good batting average" -- Sam Goldwyn (MGM)

Dr Nick Levine

unread,

Dec 6, 2000, 3:00:00 AM12/6/00

to

> | I wonder why there is no destructive version of SETF LDB that would modify
> | the given integer?
> | This would certainly be useful.
>
> I agree that this would be very useful with bignums

But not with fixnums. Consider:

(let* ((x 0)
(y x))
(setf (ldb (byte 1 1) y) 1)
(list x y))

=>

(0 2)

The language model would have to be changed more than somewhat for the
value of x to be changed. [Effectively, CL always passes numbers and
characters "by value" and everything else "by reference".] Also, if the
zero was destructively modified, how many other zeros would also change?
What would happen to zerop? Ouch!

It's also unnerving to think of code whose semantics (as opposed to
efficiency) changes when you pass through the fixnum/bignum boundary.

-n

Erik Naggum

unread,

Dec 7, 2000, 3:00:00 AM12/7/00

to

Well, I _had_ hoped to see semething a tad less contrived, as this is
bad style on so many counts that I'm likely to believe nobody has done
anything like this that would cause a _change_ in observable behavior.

To find a change in semantics, you would at least have to find
something that people do today with dpb (or setf of ldb). But at
worst, we could define a destructive version of dpb, which would be
pretty easy to write even today.

Erik Naggum

unread,

Dec 7, 2000, 11:16:02 AM12/7/00

to

* Dr Nick Levine <n.le...@anglia.ac.uk>

| The language model would have to be changed more than somewhat for the
| value of x to be changed. [Effectively, CL always passes numbers and
| characters "by value" and everything else "by reference".]

That's not really so. That's the catch with bignums. They don't get
copied (effectively) when you pass them around, and they cons a _lot_
when they are copied, to there are good reasons to reuse them. I
think dpb would be the one operator that it would make sense to let
reuse the bignum. Its semantics _is_ to deposit bits into an existing
number.

| Also, if the zero was destructively modified, how many other zeros
| would also change? What would happen to zerop? Ouch!

Zero wouldn't magically become a bignum any time soon, would it?

| It's also unnerving to think of code whose semantics (as opposed to
| efficiency) changes when you pass through the fixnum/bignum boundary.

And what semantic changes would that really be? I see fear but little
technical argumentation, especially considering the weird non-issue
with zero that you bring up. What's really on your mind?

Tim Bradshaw

unread,

Dec 7, 2000, 11:50:51 AM12/7/00

to

Erik Naggum <er...@naggum.net> writes:

>
> And what semantic changes would that really be? I see fear but little
> technical argumentation, especially considering the weird non-issue
> with zero that you bring up. What's really on your mind?
>

Surely the semantic change is somnething like:

(defun foo (x)
(declare (type integer x))
(let ((y x))
;; if X is a bignum then Y is a pointer to X, if X is a fixnum then
;; Y is a copy of X (probably?)
(bar y)
;; right now this is true. if DPB modifies bignums it will be
;; true only if y was odd.
(= x y)))

(defun bar (y)
(declare (type integer y))
(dpb 1 (byte 1 0) y))

I hope that's right. I'd actually quite like to have this kind of
low-level access to bignums, but I also have pretty queasy feelings
about numbers becoming mutable objects.

Incidentally, as far as I can see bignum consing is one of these
things that can be made really cheap if they are ephemeral (which they
often are). I had a little program that I wrote to play with some
mathematical problem which spent quite a lot of time dealing with
bignums (it did the trick of compiling the same code twice with
different declarations via a macro to compile very good code for the
fixnum bit and then generic code for the bignum bit), and it would
cons spectacularly -- many many Gb -- but spent basically no time in
GC because it was all ephemeral and I'd set the newspace to be huge.
Of course it might have been masively faster if it could have resused
the bignums...

--tim

Tim Bradshaw

unread,

Dec 10, 2000, 9:45:58 AM12/10/00

to

Something else I thought of in regards to this is that, if you have
destructive operations on bignums then I think you really also need a
way of copying bignums, because otherwise you can never really be sure
that you are altering the object you think, or not altering something
which is sharing with some other object.

So to do this you'd need to have some NDPB operation and also
something like COPY-BIGNUM, so you could know that:

(eq x (copy-bignum x))

would be false (or in fact (eq <anything> (copy-bignum x)) is false).
At the moment you're always vulnerable to the compiler optimising away
whatever numerical operation you are doing which you think should make
a new number, and those that it can't optimise are likely also those
which cons a huge lot of junk as well.

(Incidentally I'm not against destructive bignum ops -- I've written
programs that could have benefited from them...)

--tim