Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Optimization hints?

4 views
Skip to first unread message

Larry Hunter

unread,
Aug 25, 1997, 3:00:00 AM8/25/97
to

Folks,

I have an optimizations question. I'm using Franz ACL 4.3.1 At this point,
the generation of new single-floats is killing me, both in consing and time.
I think this has to do with "boxing" of floats, but I've never really tried
to optimize numeric calculations in Allegro CL before, so I could use any
help anyone has to offer.

The function calculates a distance between two vectors. The elements of the
vector can be nominal, integer or real values, as specified in another
vector ("types"), which also contains the ranges of the numeric values.
There may be missing values and some positions may be ignored.

Here's the code:

(defun distance (v1 v2 &key (types *types*) (ignore-positions nil))
(declare (optimize (speed 3) (safety 0))
(inline missing-value-p)
(simple-vector v1 v2 types)
(list ignore-positions))
(let ((result 0.0))
(declare (single-float result))
(dotimes (i (length v1) result)
(let ((val1 (svref v1 i))
(val2 (svref v2 i)))
(unless (or (member i ignore-positions :test #'=)
(missing-value-p val1)
(missing-value-p val2))
(incf result
(the single-float
(if (eq 'nominal (first (svref types i)))
(if (eq val1 val2) 0.0 1.0)
(float (abs (/ (- val1 val2)
(- (third (svref types i))
(second (svref types i))))))))))))))

Here is some time profiling information:

% %
self total total local Function
Time Time secs % Child
99.5 99.9r ... MAP
31.2 85.8 99.6 36.4 DISTANCE
25.9 25.5 "new_single_float"
12.5 12.3 EXCL::-_2OP
8.5 8.3 EXCL::/_2OP
3.8 3.8 ABS
3.6 3.5 EXCL::RATIO-TO-SINGLE-FLOAT
2.2 2.2 EXCL::FLOAT_1OP
2.1 2.1 MISSING-VALUE-P
-----------------------------------------------------
25.9 93.8r DISTANCE
10.9 39.7r "new_single_float"
1.3 4.6r EXCL::RATIO-TO-SINGLE-FLOAT
5.1 23.7 27.6 21.4 "new_single_float"
10.9 39.7r "new_single_float"
10.9 39.4r "sigrelse"
10.3 37.5r "new_other"


Here is the space profiling:

% % Parent
self total total local Function
Mem. Mem. Kbyte % Child

86280 96.2 DISTANCE
3392 3.8 EXCL::RATIO-TO-SINGLE-FLOAT
91.7 95.5 89672 96.0 ... "new_single_float"
3568 100.0 "execv"

Does anybody have any suggestions about how I can speed this up? I need to
execute it on the order of 30 million times...

Thanks,

Larry

--
Lawrence Hunter, PhD.
National Library of Medicine phone: +1 (301) 496-9303
Bldg. 38A, 9th fl, MS-54 fax: +1 (301) 496-0673
Bethesda. MD 20894 USA email: hun...@nlm.nih.gov

Luca Pisati

unread,
Aug 25, 1997, 3:00:00 AM8/25/97
to

Larry Hunter wrote:
>
> Folks,
>
> I have an optimizations question. I'm using Franz ACL 4.3.1 At this point,
> the generation of new single-floats is killing me, both in consing and time.
> I think this has to do with "boxing" of floats, but I've never really tried
> to optimize numeric calculations in Allegro CL before, so I could use any
> help anyone has to offer.
>
> The function calculates a distance between two vectors. The elements of the
> vector can be nominal, integer or real values, as specified in another
> vector ("types"), which also contains the ranges of the numeric values.
> There may be missing values and some positions may be ignored.
>
> Here's the code:

A few issues:

1) Missing-value-p is a function or a macro ?

If it is a function, and if the argument is a float, then the
argument will be boxed. So, turn that into a macro (or a
function and a compiler-macro, if you really need afunction around.

2) V1 and V2 look like simple-vectors instead than (simple-array single-float (*)).

This forbids you to do further optimizations. If you could turn them into
(simple-array single-float (*)) you could declare val1 and val2 to be single-float.

3) declaring ignore-position to be a list doesn't do much. Also, inline is
ignored by ACL.

4) The ABS followed by a coercion looks suspicious. I believe there you
waste a lot of resources. If val1 and val2 at this stage are single-float,
then you shouldn't need any coercion.

If you tell me more about the types of v1 and v2, I can give you real answers.
I see no reason that this should cons. If you let me know, you should be able
to produce an extremely fast compiled function.

--
Luca Pisati Voice: (310) 577-0518
Nichimen Graphics Fax: (310) 577-0577
12555 W. Jefferson #285 EMail: mailto:pis...@nichimen.com
Los Angeles, CA 90066 Web: http://www.nichimen.com

R. Toy

unread,
Aug 25, 1997, 3:00:00 AM8/25/97
to

Larry Hunter wrote:

> Folks,


>
> The function calculates a distance between two vectors. The elements of the
> vector can be nominal, integer or real values, as specified in another
> vector ("types"), which also contains the ranges of the numeric values.
> There may be missing values and some positions may be ignored.

I'm not that familiar with ACL, but the compiler notes from CMUCL should give
good hints on how to optimize the code.I think this is the major problem: v1,
v2, and types aren't specialized array types. If they were (simple-array
single-float (*)), the compiler would have a good chance to optimize. Since
they are simple-vector's, they can hold anything, which means the compiler can't
optimize access to them.

> Here's the code:


>
> (defun distance (v1 v2 &key (types *types*) (ignore-positions nil))
> (declare (optimize (speed 3) (safety 0))
> (inline missing-value-p)
> (simple-vector v1 v2 types)
> (list ignore-positions))
> (let ((result 0.0))
> (declare (single-float result))
> (dotimes (i (length v1) result)
> (let ((val1 (svref v1 i))
> (val2 (svref v2 i)))
> (unless (or (member i ignore-positions :test #'=)
> (missing-value-p val1)
> (missing-value-p val2))
> (incf result
> (the single-float
> (if (eq 'nominal (first (svref types i)))
> (if (eq val1 val2) 0.0 1.0)
> (float (abs (/ (- val1 val2)
> (- (third (svref types i))
> (second (svref types i))))))))))))))
>

Some assertions on the result of (third (svref ...)) and (second (svref ...))
may help here a bit. Declaring val1 and val2 as single-float would help a lot
too, except that val1 and val2 aren't necessarily single-floats. Using NaN to
represent missing data would allow you to declare v1, v2, val1, and val2 as
single-float's would help a lot.

Basically, I don't think you can speed it up much without changing how you
represent data.

Ray

--
---------------------------------------------------------------------------
----> Raymond Toy rt...@mindspring.com
http://www.mindspring.com/~rtoy


Eric Sauthier

unread,
Aug 28, 1997, 3:00:00 AM8/28/97
to

Erik Naggum wrote:
>
> * Rainer Joswig
> | Can ACL give such information?
>
> not as good as CMUCL, which provides outstandingly useful information.
> however, the ACL compiler can be incredibly talkative. I have found it
> very informative, but there is a lot of text to walk through at times.
>
> <URL:http://www.naggum.no/erik/larry/> collects the stuff I did on this
> code and perhaps some useful compiler output, too. CMUCL didn't say much
> in the end, so I guess I have done all I could to make it happy. :)
>

I can't agree more. Working with CMU CL might be very frustrating at
times because it always finds something to complain about. The python
compiler does a tremendous job at optimizing code when the adequate
declarations are found. On the other hand, it is not forgiving when you
omit them! Working with CMU CL has made me more aware of the code
quality of I produce when it comes to speed things up - once the "most
adequate" algorithm for the task at hand has been designed/implemented
of course :-). Erik you can definitely be happy with what you produced
if even CMU CL remained silent!


> #\Erik
> --
> man who cooks while hacking eats food that has died twice.

Bulent Murtezaoglu

unread,
Aug 29, 1997, 3:00:00 AM8/29/97
to

Larry Hunter <hun...@nlm.nih.gov> writes:
[...]
> Do you have any other suggestions?

I have dealt with something like this before (you are doing something like
machine learning or nearest neighbor right?). The solution I came up with
was calling the compiler from within the program once the types were known.
So the function that would get called a zillion times actually got written
and compiled on the fly once the type information was known. I believe
this is one of those cases where calling the compiler from running code
can be justified.

BM

Bulent Murtezaoglu

unread,
Sep 1, 1997, 3:00:00 AM9/1/97
to

To follow up on my posting:

Here's the point: in cases like Larry's there really is a good
efficiency point to be made for Lisp that Lisp books I have seen
seldom make. Given a decent compiler, Lisp can be fundamentally more
efficient than C in this case. Larry's problem, as I understand it,
is finding the distances between vectors where each position can
either be ignored or a distance can be computed with a method
depending on the meaning of the value stored in the position. The
type of value stored at each position is unknown at compile time. His
solution to the problem is to maintain the type information and the
list of positions to be ignored and have his distance function check
do the right thing at run time. His distance function
gets called 30 million times, so he needs efficiency. Lots of folks
came up with helpful hints/code, so his code is now probably about as
efficient as the C equivalent would be. BUT LISP CAN GO FURTHER. The
the key observation is that while the ignore specs and the types are
not known at compile time, they are known at the beginning of [and do
not change during] the computation where the routine is called 30
million times. Both the C and the given Lisp code are checking the
types unnecessarily for each call. The C programmer cannot do much
for this, but the Lisp programmer can. One can construct a function
once the type and ignore specs are known and compile it right before
it gets used heavily. The type and ignore checks are done only once while
the function is getting constructed, and then we just end up with the
code that does the distance calculation. The overhead involved in the
run-time compilation should be miniscule compared to the efficiency
gain. And the Lisp programmer can do this before every heavy
computation involving this function. If you find yourself really
wishing you knew something at compile time, you can just tell Lisp
what you would have done...

Here's what I'm talking about

;;given
USER(16): (set-up '(:nominal :nominal :real :real :nominal) '(1))
;; 5-long vectors w/ type specs as shown and 2nd entries are to be ignored
;; we want to get a compiled distance function defined as
(LAMBDA (V1 V2)
(DECLARE (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
(TYPE (SIMPLE-ARRAY SINGLE-FLOAT (5)) V1 V2))
(+ (IF (= (AREF V1 0) (AREF V2 0)) 0 1)
(IF (= (AREF V1 4) (AREF V2 4)) 0 1)
(ABS (- (AREF V1 2) (AREF V2 2)))
(ABS (- (AREF V1 3) (AREF V2 3)))))

;; I'm also unrolling the loop and using integer addition whenever
;; possible but that's not the point.

Here's the sample code that does the above:


;; types are
;; (<type0> <type1> ... <typen>)
;; where type is one of :nominal or others (:real)

;; the ignore list is a list of non negative integers
;; denoting the places in the original vector to be ignored in the
;; distance calculation (0 origin)

(defun set-up (types ignore)
(let* (nominals others (dim (length types)))
(dotimes (i dim)
(unless (member i ignore)
(cond ((eq (elt types i) ':nominal)
(push i nominals))
(t (push i others)))))
(compile 'distance
`(lambda (v1 v2)
(declare (optimize (speed 3) (safety 0) (debug 0))
(type (simple-array single-float (,dim)) v1 v2)
#+allegro (:explain :calls :types))
(+
,@(let ((acclist))
(dolist (position others)
(push `(abs (- (aref v1 ,position)
(aref v2 ,position)))
acclist))
(dolist (position nominals)
(push `(if (= (aref v1 ,position)
(aref v2 ,position))
0 1)
acclist))
acclist))))))


I count 23 lines. The function distance, when set-up is called
with the shown parameters, compiles into 169 bytes of machine code
in Allegro (Linux x86).


Folks, if you agree that this is a sane and useful thing to do, maybe we
should come up with a problem like Larry's and do various tricks to it in
both Lisp and C, do some timings and put it up in one of the web pages?
(MC's, MH's, RJ's or maybe ALU's) I have used this kind of trick quite
a few times to make my inner loops efficient and I'm sure there are
others besides me and Larry who need to call a function a zillion times.
These people probably don't realize what efficiencies and convenience can
be gained by using Lisp.

BM

0 new messages