Re: Slow Loop (alternatives in lisp?)

Message has been deleted

Jochen Schmidt

unread,

Aug 16, 2008, 6:21:51 AM8/16/08

to

On 16 Aug., 11:42, Francogrex <fra...@grex.org> wrote:
> Hello, I'm trying to imitate the behaviour of the pivot-table in excel
> where you take a list of items and another list of their values and
> you sum similar ones together (see toy example below). I have a list
> of 30000 items and their associated values and in excel using a pivot-
> table the computation is done instantaneously (less than 2 seconds)
> while the procedure I wrote in lisp will take about 12 hours !(I give
> an example of only 15 items below, this goes fast of course because
> only 15 items, but the 30,000 will take an estimate of about 12 hours;
> I never reached that far because around 5 hours I give up). Do you
> know why? Is there a way to enhance the procedure and make it as fast
> as the pivot table? Thanks
>
> ;;Toy example on only 15 items:
> ;; ls is the list and counter is the associated values
> (defvar ls '("a" "b" "c" "b" "a" "f" "e" "g" "h" "k" "z" "k" "r" "u"
> "f"))
> (defvar counter '(1 5 8 7 14 8 3 7 9 4 3 21 5 7 9))

Two points:

1) Name special variables using asterisks "*ls*"
2) Never use literal objects if you plan to destructively modify them
(see below)

Minor issue - naming variables is often better done using the
semantics within the program and
not their actual type (I guess you you used "ls" as an abbreviation of
"list").

> ;;while macro similar to dotimes.
> (defmacro while (condition &rest body)
> (let ((var (gensym)))
> `(do ((,var nil (progn ,@body)))
> ((null ,condition) ,var))))

Why to the hell do you need this macro if you actually really want
dotimes (a counting loop)?

> ;;Tabulate like the pivot table.
> (setf i 0)
> (while (< i (length ls))
> (setf j (+ i 1))
> (while (< j (length ls))
> (if (AND (equal (nth i ls) (nth j ls)) (not (equal (nth j ls)
> 'indic)))
> (AND (setf (nth i counter)
> (+ (nth j counter) (nth i counter)))
> (AND (setf (nth j ls) 'indic) (setf (nth j counter) 'indic))))
> (incf j))
> (pprint i)(incf i))
> ;;Remove the indicators.
> (delete 'indic ls)
> (delete 'indic counter)
> ;; printing ls and counter now gives the summed values.

You're using LENGTH and NTH within each iteration of your loop. This
functions have a time complexity of O(N) with N being the length of
the used lists. As your algorithm itself is actually O(N^2) (two
nested loops) the number of accesses to the list are quite obscene.
Each one of the NTH calls within your inner loop will have N accesses
to the list so its N*N*N=N^3*8 accesses to the loop (not counting
accesses in outer loops). If your list has 30000 elements this would
mean something like 216,000,000,000,000 accesses to the list! On a
4GHz computer with one access counted oversimplified as one cycle this
would be somewhere around 15 hours of list accessing. If you just
eliminate the O(N) accesses to the list - the estimated runtime would
go down to something like 1.8 seconds.

ciao,
Jochen

--
Jochen Schmidt
CRISPYLOGICS
Uhlandstr. 9, 90408 Nuremberg

Fon +49 (0)911 517 999 82
Fax +49 (0)911 517 999 83

mailto:(format nil "~(~36r@~36r.~36r~)" 870180 1680085828711918828
16438) http://www.crispylogics.com

Message has been deleted

Marco Antoniotti

unread,

Aug 16, 2008, 8:55:41 AM8/16/08

to

On Aug 16, 8:21 am, Francogrex <fra...@grex.org> wrote:

> On Aug 16, 12:21 pm, Jochen Schmidt <j...@crispylogics.com> wrote:
>
> > You're using LENGTH and NTH within each iteration of your loop. This
> > functions have a time complexity of O(N) with N being the length of
> > the used lists. As your algorithm itself is actually O(N^2) (two
> > nested loops) the number of accesses to the list are quite obscene.
> > Each one of the NTH calls within your inner loop will have N accesses
> > to the list so its N*N*N=N^3*8 accesses to the loop (not counting
> > accesses in outer loops). If your list has 30000 elements this would
> > mean something like 216,000,000,000,000 accesses to the list! On a
> > 4GHz computer with one access counted oversimplified as one cycle this
> > would be somewhere around 15 hours of list accessing. If you just
> > eliminate the O(N) accesses to the list - the estimated runtime would
> > go down to something like 1.8 seconds.
>

> Thanks, you gave me good tips on a better style and I think I can
> substitute LENGTH so the program would be less troubled with repeated
> access to that function, but you didn't give me an alternative to
> using NTH: how else and more efficiently would I be accessing
> individual elements of the list?

Don't write useless macros like WHILE and use (LOOP WHILE ....)
Use VECTORs. As Jochen has told you list operations are inherently
linear. Just don't use them for this sort of numerical work and use
vectors or arrays instead. Access is then constant time.

Cheers
--
Marco

Volkan YAZICI

unread,

Aug 16, 2008, 9:37:43 AM8/16/08

to

On Aug 16, 12:42 pm, Francogrex <fra...@grex.org> wrote:
> (setf i 0)
> (while (< i (length ls))
> (setf j (+ i 1))
> (while (< j (length ls))
> (if (AND (equal (nth i ls) (nth j ls)) (not (equal (nth j ls)
> 'indic)))
> (AND (setf (nth i counter)
> (+ (nth j counter) (nth i counter)))
> (AND (setf (nth j ls) 'indic) (setf (nth j counter) 'indic))))

I'm not an expert, but AFAIS for your code snippet, you're coding in C
using Common Lisp. I'd advise you to giving SICP[1] a try, and then
play with lisp again. Anyway, here is my 2 cents:

(defun concatenate-frequencies (items freqs &key (size 256) (test
'eql))
(let ((uniq-freqs (make-hash-table :size size :test test)))
(mapc
(lambda (item freq)
(setf (gethash item uniq-freqs)
(+ (gethash item uniq-freqs 0) freq)))
items
freqs)
(format t "~s~%" (hash-table-count uniq-freqs))
(loop for item being each hash-key of uniq-freqs
collect (cons item (gethash item uniq-freqs)))))

(let ((char-lo-limit #.(char-code #\a))
(char-hi-limit #.(char-code #\z)))
(defvar *items*
(loop repeat 1000000
with size = (- char-hi-limit char-lo-limit)
collect (code-char (+ char-lo-limit (random size))))))

(defvar *freqs* (loop repeat 1000000 collect (random 1000)))

TEST> (time (concatenate-frequencies *items* *freqs* :size 1000000))
25
Evaluation
took:
0.194 seconds of real
time
0.192012 seconds of total run time (0.168010 user, 0.024002
system)
[ Run times consist of 0.060 seconds GC time, and 0.133 seconds non-
GC
time. ]
98.97%
CPU
427,929,406 processor
cycles
40,461,888 bytes
consed
((#\g . 20024018) (#\m . 19875326) (#\d . 20022273) (#\n .
19792959)
(#\k . 20024116) (#\r . 19970136) (#\i . 19799098) (#\c .
20077641)
(#\o . 19887064) (#\h . 19929361) (#\b . 20254118) (#\w .
19951612)
(#\y . 20060516) (#\u . 20006662) (#\f . 19923075) (#\x .
19958120)
(#\e . 19916717) (#\a . 19978439) (#\j . 20089621) (#\p .
20096731)
(#\s . 19901291) (#\l . 20046440) (#\q . 19915451) (#\t .
19908266)
(#\v . 20070813))

0.192 seconds for a list of 1,000,000 elements is not that bad.

Regards.

[1] http://mitpress.mit.edu/sicp/full-text/book/book.html

Rainer Joswig

unread,

Aug 16, 2008, 10:33:31 AM8/16/08

to

In article
<85ae266d-e197-4c2b...@k7g2000hsd.googlegroups.com>,
Francogrex <fra...@grex.org> wrote:

> Hello, I'm trying to imitate the behaviour of the pivot-table in excel
> where you take a list of items and another list of their values and
> you sum similar ones together (see toy example below). I have a list
> of 30000 items and their associated values and in excel using a pivot-
> table the computation is done instantaneously (less than 2 seconds)
> while the procedure I wrote in lisp will take about 12 hours !(I give
> an example of only 15 items below, this goes fast of course because
> only 15 items, but the 30,000 will take an estimate of about 12 hours;
> I never reached that far because around 5 hours I give up). Do you
> know why? Is there a way to enhance the procedure and make it as fast
> as the pivot table? Thanks
>
>
> ;;Toy example on only 15 items:
> ;; ls is the list and counter is the associated values
> (defvar ls '("a" "b" "c" "b" "a" "f" "e" "g" "h" "k" "z" "k" "r" "u"
> "f"))
> (defvar counter '(1 5 8 7 14 8 3 7 9 4 3 21 5 7 9))

> ;;while macro similar to dotimes.
> (defmacro while (condition &rest body)
> (let ((var (gensym)))
> `(do ((,var nil (progn ,@body)))
> ((null ,condition) ,var))))

> ;;Tabulate like the pivot table.

> (setf i 0)
> (while (< i (length ls))
> (setf j (+ i 1))
> (while (< j (length ls))
> (if (AND (equal (nth i ls) (nth j ls)) (not (equal (nth j ls)
> 'indic)))
> (AND (setf (nth i counter)
> (+ (nth j counter) (nth i counter)))
> (AND (setf (nth j ls) 'indic) (setf (nth j counter) 'indic))))

> (incf j))
> (pprint i)(incf i))
> ;;Remove the indicators.
> (delete 'indic ls)
> (delete 'indic counter)
> ;; printing ls and counter now gives the summed values.

First you should try to format your code so that we can read it:

(defvar ls '("a" "b" "c" "b" "a" "f" "e" "g" "h" "k" "z" "k" "r" "u" "f"))
(defvar counter '(1 5 8 7 14 8 3 7 9 4 3 21 5 7 9))

;;while macro similar to dotimes.
(defmacro while (condition &rest body)
(let ((var (gensym)))
`(do ((,var nil (progn ,@body)))
((null ,condition) ,var))))

;; Tabulate like the pivot table.

(setf i 0)
(while (< i (length ls))
(setf j (+ i 1))
(while (< j (length ls))
(if (AND (equal (nth i ls) (nth j ls))
(not (equal (nth j ls) 'indic)))
(AND (setf (nth i counter)
(+ (nth j counter) (nth i counter)))
(AND (setf (nth j ls) 'indic) (setf (nth j counter) 'indic))))

(incf j))
(pprint i)
(incf i))

;;Remove the indicators.
(delete 'indic ls)
(delete 'indic counter)

Let's have a look:

* variables defined by defvar should be written as *ls* and *counter* .

* there is already a WHILE in Common Lisp. No need to invent a new one:

(loop while (foo-p) do .... )

* don't write code like this. Write functions you can try out.

* understand when to use lists and when arrays

* using multiple calls to LENGTH on lists in Loops is expensive

* using random access on lists is expensive. The cheapest
access you get on the first element.

* don't modify literal lists. They are constant data.

You may want to read some beginner's book on Lisp and list processing.

Here is a 'solution' that is a bit more complicated, since
it uses a hash-table. An alternative would have been
to use an assoc list. But the solution is 'fast' for
long lists.

(defun distribution1 (items values test)
(let ((table (make-hash-table :test test)))
(loop for item in items
for value in values
do (incf (gethash item table 0) value))
(let ((items-list nil))
(maphash (lambda (item sum-value)
(push (cons item sum-value) items-list))
table)
(sort items-list #'> :key #'cdr))))

An example call:

CL-USER 58 > (distribution1 '("a" "b" "c" "b" "a" "f" "e" "g" "h" "k" "z" "k" "r" "u" "f")

'(1 5 8 7 14 8 3 7 9 4 3 21 5 7 9)

#'equal)
(("k" . 25) ("f" . 17) ("a" . 15) ("b" . 12) ("h" . 9) ("c" . 8) ("g" . 7) ("u" . 7) ("r" . 5) ("e" . 3) ("z" . 3))

You can see that the loop goes once over the input lists. It iterates
of both lists in one loop. For every element in list ITEMS
the hashtable will be updated by incrementing by the value.

Then it walks with MAPHASH once over the hashtable and builds a new
list of item sum-value. This is almost our result.
Here the result is sorted, so the items with the largest
sum-values are first.

So, it walks once first to last (which is efficient in Lisp)
over the lists. The update of the hash-table is fast, too.
Then the result list is created. and the result list is sorted.

--
http://lispm.dyndns.org/

Tamas K Papp

unread,

Aug 16, 2008, 10:55:58 AM8/16/08

to

On Sat, 16 Aug 2008 02:42:02 -0700, Francogrex wrote:

> Hello, I'm trying to imitate the behaviour of the pivot-table in excel

Besides what others said: I would recommend that you start reading an
intro book. http://www.gigamonkeys.com/book/ is a very good one, for
example. You would benefit more from the comments - right now your code
is so un-lispy that incremental criticism probably helps very little.

Tamas

Grant Rettke

unread,

Aug 16, 2008, 12:12:20 PM8/16/08

to

On Aug 16, 9:55 am, Tamas K Papp <tkp...@gmail.com> wrote:

> right now your code is so un-lispy that incremental criticism probably helps very little.

Well, it is written in Lisp, so how can it be un-lispy? :)

I think that *I* know what you mean, but you can't expect someone who
is writing un-lispy code to know what you mean.

So, what do you mean?

Tamas K Papp

unread,

Aug 16, 2008, 12:48:53 PM8/16/08

to

I think that others explained that in detail above.

All I meant that OP was not using Lisp idiomatically - which is OK, I am
sure he will learn it in time. But foundations are important - what I
would do in his place is read some of the initial chapters of PCL, then
come back and look at the comments he got for his code snippet.

Tamas

Pascal J. Bourguignon

unread,

Aug 16, 2008, 1:02:57 PM8/16/08

to

Francogrex <fra...@grex.org> writes:

> Hello, I'm trying to imitate the behaviour of the pivot-table in excel

> where you take a list of items and another list of their values and
> you sum similar ones together (see toy example below). I have a list
> of 30000 items and their associated values and in excel using a pivot-
> table the computation is done instantaneously (less than 2 seconds)
> while the procedure I wrote in lisp will take about 12 hours !(I give
> an example of only 15 items below, this goes fast of course because
> only 15 items, but the 30,000 will take an estimate of about 12 hours;
> I never reached that far because around 5 hours I give up). Do you
> know why? Is there a way to enhance the procedure and make it as fast
> as the pivot table? Thanks

;; Tabulate like the pivot table.
(time
(let ((ls (vector "a" "b" "c" "b" "a" "f" "e" "g"

"h" "k" "z" "k" "r" "u" "f"))

(counter (vector 1 5 8 7 14 8 3 7 9 4 3 21 5 7 9))
(i 0))
(loop while (< i (length ls)) do
(let ((j (+ i 1)))
(loop while (< j (length ls)) do
(when (and (equal (elt ls i) (elt ls j))
(not (equal (elt ls j) 'indic)))
(incf (elt counter i) (elt counter j))
(setf (elt ls j) 'indic
(elt counter j) 'indic))
(incf j)))
(incf i))
(values (delete 'indic ls)
(delete 'indic counter))))

Real time: 0.009765 sec.
Run time: 0.012 sec.
Space: 102408 Bytes
#("a" "b" "c" "f" "e" "g" "h" "k" "z" "r" "u") ;
#(15 12 8 17 3 7 9 25 3 5 7)

--
__Pascal Bourguignon__ http://www.informatimago.com/

"You question the worthiness of my code? I should kill you where you
stand!"

Pascal J. Bourguignon

unread,

Aug 16, 2008, 1:07:03 PM8/16/08

to

Perhaps idiomatic usage would be to use vectors for a vector based
algorithm, and lists for a list based algorithm?

Message has been deleted

Rainer Joswig

unread,

Aug 16, 2008, 3:58:29 PM8/16/08

to

In article
<4d325909-faf8-4a16...@j22g2000hsf.googlegroups.com>,
Francogrex <fra...@grex.org> wrote:

> On Aug 16, 4:33 pm, Rainer Joswig <jos...@lisp.de> wrote:
> > ...You may want to read some beginner's book on Lisp and list processing...
>
> Thanks to all you guys helping me find solutions and being patient
> with me. I am reading several books including 2 Paul Graham books:
> "ANSI common lisp" (done) and "on lisp" (I am still reading it).
> Seibel-"Practical Common Lisp" (half-way through) Touretzky-"CL: A
> Gentle Introduction to Symbolic Computation" (done) and
> Norvig-"Paradigms of AI programming" (the book was actually a gift
> from a friend; I still find it very difficult, but was the one that
> attracted me to Common Lisp in the 1st place).

That's why I'm mentioning it. If you want to learn driving then
'On Lisp' will explain you how to be a stunt driver and
drive on two, one or even zero wheels.

ANSI CL is somehow useful to get an overview. Touretzky is probably
the most useful text for learning basics (coding algorithms in Lisp).

Check out also this book:

http://www.cse.buffalo.edu/pub/WWW/faculty/shapiro/Commonlisp/

It has , as its names says, a very interactive approach.

Writing lots of little functions and trying out alternative
versions is important. I took also some algorithms book
and coded the stuff that was in there.

Writing code and posting it here, like you do, is also
a good approach. But check out some basic texts about symbolic
programming. Reading SICP and trying the examples in either
Scheme or Common Lisp is also recommended.

> Reading is fun and very
> useful but I have to keep practicing (doing the exercises) to become
> better. I admit I have still to shake off the damned Cpp and Pascal
> habits. I think I would have been better quicker in CL if I came to it
> without knowing any of those other languages...

This effect has been observed long ago:

Anyone could learn Lisp in one day, except that if they already
new Fortran, it would take three days.

Marvin Minsky

--
http://lispm.dyndns.org/