I'm starting to learn lisp and doing the exercises of Ansi Common Lisp
authored by Paul Graham.
So my solution to counting occurences in a list and outputting a
list of dotted pairs (symbol . count) sorted by count is
(defun occurences-helper (lst result)
(if (null lst)
result
(progn
(let ((symb (car lst)))
(let ((test (assoc symb result)))
(if test
(setf (cdr test) (+ (cdr test) 1))
(setf result (cons (cons symb 1) result)))))
(occurences-helper (cdr lst) result))))
(defun occurences (lst)
(sort
(occurences-helper lst ())
#'(lambda (x y)
(>
(cdr x)
(cdr y))))
)
And I thought: "Hmm that doesn't look elegant" but I'm not used to
recursion and functional programming and that stuff so I didn't
came up with a better version.
Maybe some experienced lisp hacker may provide a different one?
Yours sincerely,
Eric
> (defun occurences-helper (lst result)
> (if (null lst)
> result
> (progn
> (let ((symb (car lst)))
> (let ((test (assoc symb result)))
> (if test
> (setf (cdr test) (+ (cdr test) 1))
> (setf result (cons (cons symb 1) result)))))
> (occurences-helper (cdr lst) result))))
Note that this isn't "functional" in the sense "side-effect-free", as
you're clearly modifying the result list. That's fine, CL is not abount
such "functional" programming, although you end up here with a somewhat
strange mix of programming styles.
I'd use a plist because of the existence of (setf getf):
(defun occurences-helper (list result)
(if (null list)
result
(progn
(incf (getf result (car list)))
(occurences-helper (cdr list) result))))
But really I'd iterate normally:
(defun occurences-helper (list)
(loop with result = nil
for element in list
do (incf (getf result element))
finally (return result)))
> (defun occurences (lst)
> (sort
> (occurences-helper lst ())
> #'(lambda (x y)
> (>
> (cdr x)
> (cdr y)))))
Assuming an assoc result again, this can be written like so:
(defun occurences (list)
(sort (occurences-helper list nil)
#'>
:key #'cdr))
A different approach:
(defun occurences (list)
(sort (mapcar (lambda (element)
(cons element
(count element list)))
(remove-duplicates list))
#'>
:key #'cdr))
--
Frode V. Fjeld
(defun occurrences (list)
(loop with hash-table = (make-hash-table)
for item in list
do
(incf (gethash item hash-table 0))
finally
return (sort (loop for key being each hash-key of hash-table using (hash-value count)
collect (cons key count))
#'>
:key #'cdr)))
Just as an antidote to Graham's style.
SORT takes both a comparison predicate, and a key function. This let
you avoid building an anonymous function.
You can avoid the helper function by using an optional parameter.
But I would keep the helper function since the signature of OCCURENCES
doesn't involve an optional parameter per se, and it's often more
efficient to have functions with fixed number of parameters. Just put
it as a local function using LABELS.
The best advice, is to avoid SETF. Try to think functionnaly. Of
course, Common Lisp is multi-paradigm, and we don't refuse an
occasional state modification, as long as it's of limited scope, and
the alternative would be more costly and less readable. So I still
use INCF to increment the counter, but this is done on a data
structure that is being built by the count-occurences function, and
the alternative would be to cons the backbone of the a-list again and
again or using even more complex functional data structures.
(defun occurences (list &optional counts)
(if (null list)
(sort counts (function >) :key (function cdr))
(let ((pair (assoc (first list) counts)))
(if (null pair)
(occurences (rest list) (acons (first list) 1 counts))
(progn
(incf (cdr pair))
(occurences (rest list) counts))))))
(defun occurences (list)
(labels ((count-occurences (list counts)
(if (null list)
(sort counts (function >) :key (function cdr))
(let ((pair (assoc (first list) counts)))
(if (null pair)
(count-occurences (rest list) (acons (first list) 1 counts))
(progn
(incf (cdr pair))
(count-occurences (rest list) counts)))))))
(count-occurences list '())))
(occurences '(a b c a b c a b a))
--> ((A . 4) (B . 3) (C . 2))
Of course, in Common Lisp, we would just use an interation construct:
(defun occurences (list)
(loop
:with counts = '()
:for item :in list
:for pair = (assoc item counts)
:if pair
:then (incf (cdr pair))
:else (push (cons item 1) counts)
:finally (return (sort counts (function >) :key (function cdr)))))
A good compiler would generate the same code for both sources, so it's
up to you to see which is clearer.
--
__Pascal Bourguignon__ http://www.informatimago.com/
But this is not ANSI CL, but CLtL1.
FINALLY takes a compound-form+, not an unconditional.
IMHO a better way to approach this, particular for beginners, is to use
abstract associative maps rather than association lists. A CL
implementation of abstract associative maps can be found here:
http://www.flownet.com/ron/lisp/dictionary.lisp
You'll also need the rg-utils.lisp file.
Once you have abstract associative maps, you can make an occurrence
counter trivially:
(require :dictionary)
(defclass counter (dictionary) ())
(defmethod add ((c counter) item)
(setf (ref c item) (1+ (ref c item 0))))
(defun count-items (list)
(let ((c (make-instance 'counter)))
(dolist (item list) (add c item))
c))
? (count-items '(one two two three three three))
#<COUNTER HASH-TABLE { ONE 1 TWO 2 THREE 3 }>
Sorting the output is left as an exercise.
rg
> (defmethod add ((c counter) item)
> (setf (ref c item) (1+ (ref c item 0))))
Does (incf (ref c item 0)) work?
Zach
No. REF is a macro that expands into a call to the generic function
REF1 if no default argument is specified, or REFD if a default is
specified. If you look at the code for the HISTOGRAM class you'll see
that the call to REFD is actually there explicitly. (I took it out when
I posted here in the hopes of making it less confusing.) There is no
SETF method for REFD, only for REF1. I could add a setf method for
REFD, but the overall goal of this code is to make pedagogy a top
priority, and I am waffling over whether having a SETF method for REFD
makes things clearer or more confusing. On the one hand, it would allow
code like this to be written even more concisely, but on the other hand
SETFing REFD doesn't really make semantic sense.
Actually, outside opinions on this would be welcome.
rg
> In article <87fx0ls...@hangup.portland.xach.com>,
> Zach Beane <xa...@xach.com> wrote:
>
>> RG <rNOS...@flownet.com> writes:
>>
>> > (defmethod add ((c counter) item)
>> > (setf (ref c item) (1+ (ref c item 0))))
>>
>> Does (incf (ref c item 0)) work?
>>
>> Zach
>
> No.
That is a crappy design.
Zach
> RG <rNOS...@flownet.com> writes:
>
> > In article <87fx0ls...@hangup.portland.xach.com>,
> > Zach Beane <xa...@xach.com> wrote:
> >
> >> RG <rNOS...@flownet.com> writes:
> >>
> >> > (defmethod add ((c counter) item)
> >> > (setf (ref c item) (1+ (ref c item 0))))
> >>
> >> Does (incf (ref c item 0)) work?
> >>
> >> Zach
> >
> > No.
>
> That is a crappy design.
Thank you for that very constructive criticism. If that is the feature
that will make the difference between you using this library and not
it's really easy to change.
rg
One issue here is "recursion obsession". It comes from Graham being more
a Schemer than a Lisper. Lispers rejoice in multiple paradigms, and try
to use the right one at the right time. In this case, when iterating
iterate. But if you want to do recursion:
(defun occurences (list)
(labels ((occ (list)
(fun-stuff (car list) (occ (cdr list))))
(occ list)))
ie, no need for an external helper, we have labels and flet.
Where is fun-stuff? You mostly have it already, just take what you have
and lose all the setfs and progns -- you /are/ reading Graham, aren't you?
Not a bad start, btw.
kt
With FSet it's even more trivial:
(defun count-items (list)
(convert 'bag list))
(Sorting is again left as an exercise.)
-- Scott
> (defun occurences-helper (list)
> (loop with result = nil
> for element in list
> do (incf (getf result element))
> finally (return result)))
Small correction: (getf result element 0)
Nicolas
By the way, what is the reason that you have to say "finally (return
...)" instead of just "finally return ..."? So in other words, why does
FINALLY not take an unconditional in ANSI CL? This is probably the most
common mistake I'm tripping over when using LOOP...
Pascal
--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
> On 18/06/2010 00:17, Pascal J. Bourguignon wrote:
>>> (defun occurrences (list)
>>> (loop with hash-table = (make-hash-table)
>>> for item in list
>>> do
>>> (incf (gethash item hash-table 0))
>>> finally
>>> return (sort (loop for key being each hash-key of hash-table using (hash-value count)
>>> collect (cons key count))
>>> #'>
>>> :key #'cdr)))
>>>
>>> Just as an antidote to Graham's style.
>>
>> But this is not ANSI CL, but CLtL1.
>> FINALLY takes a compound-form+, not an unconditional.
>
> By the way, what is the reason that you have to say "finally (return
> ...)" instead of just "finally return ..."? So in other words, why
> does FINALLY not take an unconditional in ANSI CL? This is probably
> the most common mistake I'm tripping over when using LOOP...
I think it's explained by the + after compound-form: it allows for
several forms to be executed in sequence in the finally clause.
(loop for i below 10
collect i into result
finally (print 'done)
(assert (= 10 (length result)))
(return (remove-if 'oddp result)))
Perhaps also for symetry with the INITIALLY clause.
What's wrong with hash-tables which are in the standard?
R> (defun count-items (list)
R> (let ((c (make-instance 'counter)))
R> (dolist (item list) (add c item))
R> c))
(defun count-items (list)
(let ((ht (make-hash-table)))
(dolist (item list)
(incf (gethash item ht 0)))
ht))
The only difference I see is that GETHASH accessor has default value and you
can use INCF directly on it, no need for a goddamn additional function for
that.
> I think it's explained by the + after compound-form: it allows for
> several forms to be executed in sequence in the finally clause.
But if it allowed an unconditional there that would still be OK, you'd
just need to say "finally do ... ..." since DO takes a compoint-form+
as well.
This annoys me fairly frequently, I can't see a good reason for it.
Here's another hashtable-based solution:
First, a function to convert a hashtable to an alist:
(defun hashtable->alist (ht &aux alist)
(maphash #'(lambda (key value) (push (cons key value) alist)) ht)
(nreverse alist))
or with LOOP:
(defun hashtable->alist (ht)
(loop for key being the hash-keys in ht using (hash-value value)
collect (cons key value)))
The actual COUNT-OCCURANCES function:
(defun count-occurances (list &aux (ht (make-hash-table)))
(dolist (element list (sort (hashtable->alist ht) #'> :key #'cdr))
(incf (gethash element ht 0))))
Two functional versions (just out of curiosity):
(defun count-elements (sequence &key (test #'eql))
(reduce (lambda (element result)
(acons element
(let ((cons (assoc element result :test test)))
(if cons (1+ (cdr cons)) 1))
(remove element result :key #'car :test test)))
sequence :initial-value '() :from-end t))
(defun count-elements/2 (sequence)
(reduce (lambda (element result)
(multiple-value-bind
(indicator value tail)
(get-properties result (list element))
(if indicator
(list* element (1+ value)
(nconc (ldiff result tail) (cddr tail)))
(list* element 1 result))))
sequence :initial-value '() :from-end t))
I used :from-end to keep the elements in the order they appear in the
original sequences.
Thanks, I guess I forgot to qualify all my code as "untested".
--
Frode V. Fjeld
> And I thought: "Hmm that doesn't look elegant" but I'm not used to
> recursion and functional programming and that stuff so I didn't
> came up with a better version.
This is kind of a counterclaim to the "use some complicated abstract
data structure" stuff (See Gabriel on that). Depends on tail-call
elimintation if you want to use it on long lists (and of course in real
life you would use alist functions not roll your own as here).
(defun occ (list &key (test #'eql))
(labels ((oc (lt al)
(if (null lt)
al
(oc (rest lt) (oca (first lt) al al))))
(oca (s alt al)
(cond ((null alt)
(cons (cons s 1) al))
((funcall test s (car (first alt)))
(incf (cdr (first alt)))
al)
(t (oca s (rest alt) al)))))
(oc list nil)))
Wow, you're the second person to react violently to the lack of a setf
expander for REF with a default argument. Why is that so important? It
seems like such a trivial issue to me. You can already do this:
(defmethod add ((c counter) item)
(incf (refd c item 0)))
if the explicit call really rubs you the wrong way, and making it work
for REF would be trivial.
But to answer your question directly:
> What's wrong with hash-tables
It's a premature optimization. What if hash tables are not the right
data structure? What if it turns out that a red-black tree would be
more efficient for the particular data you are processing? Or what if
you have too much data to fit in main memory and you want to use a
database as a backing store? If you've hard-coded in calls to GETHASH
you now have to change ALL your code. If you use abstract associative
maps you only have to change the underlying implementation. You can
even do this at run time:
? (setf c (make-instance 'counter))
#<COUNTER HASH-TABLE { }>
? (add c 'foo)
1
? (add c 'baz)
1
? (add c 'baz)
2
? c
#<COUNTER HASH-TABLE { BAZ 2 FOO 1 }>
? (describe (slot-value c 'implementation))
#<HASH-TABLE :TEST EQUAL size 2/60 #x3020019B5CCD>
Type: HASH-TABLE
Class: #<BUILT-IN-CLASS HASH-TABLE>
TYPE: (HASH-TABLE . #<CCL::CLASS-WRAPPER HASH-TABLE #x3000400296BD>)
NHASH.KEYTRANSF: #<Compiled-function CCL::%%EQUALHASH #x3000000AA3AF>
NHASH.COMPAREF: #<Compiled-function EQUAL #x30000007B23F>
NHASH.REHASH-BITS: NIL
NHASH.VECTOR: #<HASH-TABLE-VECTOR #x3020019B5DCD>
NHASH.LOCK: 524288
NHASH.OWNER: NIL
NHASH.GROW-THRESHOLD: 58
NHASH.REHASH-RATIO: 1.1764705
NHASH.REHASH-SIZE: 1.5
NHASH.PUTHASH-COUNT: 0
NHASH.EXCLUSION-LOCK: #<RECURSIVE-LOCK [ptr @ #xB8EB0C0] #x3020019B5D4D>
NHASH.FIND: #<Compiled-function CCL::GENERAL-HASH-FIND #x3000000AD27F>
NHASH.FIND-NEW: #<Compiled-function CCL::GENERAL-HASH-FIND-FOR-PUT
#x3000000ADF4F>
NHASH.READ-ONLY: NIL
? (change-implementation c 'plist)
#<COUNTER PLIST { FOO 1 BAZ 2 }>
? (describe (slot-value c 'implementation))
#<PLIST #x302001A3ED2D>
Class: #<STANDARD-CLASS PLIST>
Wrapper: #<CCL::CLASS-WRAPPER PLIST #x3020018AD7CD>
Instance slots
PLIST: (FOO 1 BAZ 2)
?
You can even make an abstract associative map that automatically chooses
an underlying implementation based on run-time profiling without
changing your client code.
rg
> You can even make an abstract associative map that automatically chooses
> an underlying implementation based on run-time profiling without
> changing your client code.
I think there's really two approaches here:
You can go for the "huge complicated reusable library which does
everything" approach. This is what Java does, and is what you're
proposing. This is great, except the library will probably not
actually do everything, and will turn out to have design deficiencies
which make it eithr annoying or impossible to use. If you are using a
minority language, then the library will not be very well tested
because too few people will be using it. It will fail at crucial
moments with results varying from you getting woken up at 3AM every day
to death. Your program will also become enormous, you will not be able
to understand why it performs so badly, and you will spend the rest of
your life trying to understand the library, smelling increasingly bad
and eventually dying on a street corner somewhere.
Or you can use the facilities provided by the language, and build
things which work for your application without stressing too much about
making them do everything. You will be laughed at by your peers, but
your program will work, you will understand it, and it will perform
well. You will become rich, and will be able to spend your time
foisting huge complicated reusable libraries on your former peers, who
are now your slaves, and laughing at them as they starve in the gutter
while trying to understand a million pages of documentation, which you
change every two weeks.
Very strong-willed people can achieve the latter even when using a
language such as Java which is predicated around the former.
R> Wow, you're the second person to react violently to the lack of a setf
R> expander for REF with a default argument. Why is that so important? It
R> seems like such a trivial issue to me. You can already do this:
R> (defmethod add ((c counter) item)
R> (incf (refd c item 0)))
R> if the explicit call really rubs you the wrong way, and making it work
R> for REF would be trivial.
Yes, it is trivial issue, but when problem itself is so simple, the only
things which are left are trivial issues.
R> But to answer your question directly:
??>> What's wrong with hash-tables
R> It's a premature optimization.
Using something else other than hash-tables is a premature optimization!
Because they are in standard, they work reasonably well for a wide variety
of situations and have quite sane API.
Each line of code which is unnecesarry and does not do anything useful is
over-engineering and premature optimization.
In your example those are:
(require :dictionary)
(defclass counter (dictionary) ())
(defmethod add ((c counter) item)
(setf (ref c item) (1+ (ref c item 0))))
Four unnecessary lines vs four useful lines. So half of your code was
over-engineering/premature optimization.
R> What if hash tables are not the right data structure?
You can think about it when you have performance problems.
I'd recommend you to read something on premature optimization, you've got it
not just wrong, but in reverse!
R> What if it turns out that a red-black tree would be more efficient for
R> the particular data you are processing?
Then I'll refactor my code, but I'll not waste my time thinking about all
possible situations before I need that.
R> Or what if you have too much data to fit in main memory and you want
R> to use a database as a backing store? If you've hard-coded in calls to
R> GETHASH you now have to change ALL your code.
If it that bad, packages can do that.
R> If you use abstract associative maps you only have to change the
R> underlying implementation. You can even do this at run time:
It is cool, but it is over-engineering.
Over-engineering usually looks cool.
R> You can even make an abstract associative map that automatically chooses
R> an underlying implementation based on run-time profiling without
R> changing your client code.
This is over-engineering classic.
Definitely.
When I was working as a C++ programmer I used STL and Boost libraries which
are meant to be very general, abstract and flexible, but unfortunately they
are often PITA to use and do not have useful functionality people need.
I think there might be a fundamental rule which imposes limit on usefulness
of "huge complicated reusable" libraries.
It might be related to Kolmogorov complexity and its invariance theorem.
There is a good discussion of reusability in Richard P. Gabriel's "Patterns
of Software" book.
> On 2010-06-18 16:22:29 +0100, RG said:
>
> > You can even make an abstract associative map that automatically chooses
> > an underlying implementation based on run-time profiling without
> > changing your client code.
>
> I think there's really two approaches here:
>
> You can go for the "huge complicated reusable library which does
> everything" approach. This is what Java does, and is what you're
> proposing.
Actually, no, that is not what I'm proposing. Quite the opposite in
fact.
dictionary.lisp is only 295 LOC including comments and blank lines.
Even if you add in rg-utils (only a tiny part of which is used by
dictionary.lisp) that's only an addition 970 LOC. That is not "huge" by
any stretch of the imagination.
As for complicated, the core API for dictionary.lisp consists of only a
single call, the REF macro. REF subsumes the functionality of GETHASH,
ASSOC, and GETF, so the API for abstract associative maps is actually
simpler than the API for the equivalent CL functionality.
Finally, the primary goal of dictionary.lisp is pedagogy and
prototyping, not hard-core production work. If you want to teach Lisp
using only the core language you have no choice but to introduce
extraneous detail, like hash tables or alists or plists. If you have
abstract associative maps you expose students to the proper separation
of API and implementation from the very beginning, which I think will
make them better programmers in the long run.
rg
(defun occurrences (list)
(let ((ht (make-hash-table)))
(dolist (x list)
(incf (gethash x ht 0)))
(sort (let ((result nil))
(maphash (lambda (k v) (push (cons k v) result)) ht)
result)
#'> :key #'cdr)))
Just as an antidote to the LOOP style :)
Really, I don't see that LOOP is any help here at all. I will
grudgingly grant that there are some cases LOOP does a good job with
that are difficult to do elegantly otherwise, but this is clearly not
one of them.
-- Scott
> Lieven Marchand <m...@wyrd.be> writes:
>> (defun occurrences (list)
>> (loop with hash-table = (make-hash-table)
>> for item in list
>> do
>> (incf (gethash item hash-table 0))
>> finally
>> return (sort (loop for key being each hash-key of hash-table using (hash-value count)
>> collect (cons key count))
>> #'>
>> :key #'cdr)))
>>
>> Just as an antidote to Graham's style.
>
> But this is not ANSI CL, but CLtL1.
> FINALLY takes a compound-form+, not an unconditional.
Indeed. Thanks for bringing this up. Lispworks accepts this without a
warning. I'll have to check my code for other occurrences :)
> dictionary.lisp is only 295 LOC including comments and blank lines.
> Even if you add in rg-utils (only a tiny part of which is used by
> dictionary.lisp) that's only an addition 970 LOC. That is not "huge" by
> any stretch of the imagination.
It's about two orders of magnutude larger than simple versions of
occurrence-counting funcions using alists or hashtables (both of which
come out at around 7 lines without comments, so maybe 9 with):
(defun occ/alist (list &key (test #'eql))
(loop with al = '()
for e in list
for a = (or (assoc e al :test test)
(first (push (cons e 0) al)))
do (incf (cdr a))
finally (return al)))
(defun occ/hash (list &key (test #'eql))
(loop with ht = (make-hash-table :test test)
for e in list
do (incf (gethash e ht 0))
finally (return (loop for k being the hash-keys of ht using (hash-value v)
collect (cons k v)))))
That's just dictionaries.
It's OK, I don't expect you to understand the point I'm making, or agree.
> On 2010-06-18 17:59:32 +0100, RG said:
>
> > dictionary.lisp is only 295 LOC including comments and blank lines. Even
> > if you add in rg-utils (only a tiny part of which is used by
> > dictionary.lisp) that's only an addition 970 LOC. That is not "huge" by
> > any stretch of the imagination.
>
> It's about two orders of magnutude larger than simple versions of
> occurrence-counting funcions using alists or hashtables (both of which
> come out at around 7 lines without comments, so maybe 9 with):
You're arguing about LOC, Ron is arguing about correct abstraction. I
have to side with Ron on this one: getting the correct abstraction is
paramount above everything else. Small LOC is good if the abstraction
doesn't matter; it's easy to change programs with small LOC.
[...]
>
> It's OK, I don't expect you to understand the point I'm making, or agree.
Do you not expect Ron to understand it because you haven't explained it
well, or because you think Ron doesn't have the mental horsepower to
understand it?
Ron and I don't agree on a number of things, but he isn't stupid.
Pigheaded, possibly, but I'm looking in a mirror as I say that.
This depends on how you count. LOC always depend on what you choose to
count as code and what you choose to count as library. You are using
LOOP, whose implementation is a helluva lot bigger than dictionaries.
> It's OK, I don't expect you to understand the point I'm making, or agree.
That train seems to be running both ways.
rg
> You're arguing about LOC, Ron is arguing about correct abstraction.
Read Chapter 18.1.1 of the CLHS: The CL hash-table itself is an
abstraction. For example, the implementation already is free to choose
"an underlying implementation based on run-time profiling without
changing your client code", to quote a part of his justification.
Piling complication on top of this simple abstraction could make sense
in the context of advanced application design, but is it good
paedagogy for beginners?
> Hello there!
>
> I'm starting to learn lisp and doing the exercises of Ansi Common Lisp
> authored by Paul Graham.
>
> So my solution to counting occurences in a list and outputting a
> list of dotted pairs (symbol . count) sorted by count is
>
> (defun occurences-helper (lst result)
> (if (null lst)
> result
> (progn
> (let ((symb (car lst)))
> (let ((test (assoc symb result)))
> (if test
> (setf (cdr test) (+ (cdr test) 1))
> (setf result (cons (cons symb 1) result)))))
> (occurences-helper (cdr lst) result))))
>
> (defun occurences (lst)
> (sort
> (occurences-helper lst ())
> #'(lambda (x y)
> (>
> (cdr x)
> (cdr y))))
> )
>
>
> And I thought: "Hmm that doesn't look elegant" but I'm not used to
> recursion and functional programming and that stuff so I didn't
> came up with a better version.
Well, with one more level of abstraction and the use of more of the
arguments to SORT, you can come up with something very similar that
looks more elegant.
;; Adds increment to the matching counts item and
;; (importantly) returns the updated counts structure
(defun add-to-count (key counts &optional (increment 1))
(let ((entry (assoc key counts)))
(if entry
(progn (incf (cdr entry) increment)
counts)
(acons key 1 counts))))
(defun occurences-helper (lst result)
(if (null lst)
result
(occurences-helper (cdr lst)
(add-to-count (car lst) results))))
(defun occurences (lst)
(sort (occurences-helper lst ()) #'> :key #'cdr))
--
Thomas A. Russ, USC/Information Sciences Institute
> * Bob Felts
>
> > You're arguing about LOC, Ron is arguing about correct abstraction.
>
> Read Chapter 18.1.1 of the CLHS: The CL hash-table itself is an
> abstraction. For example, the implementation already is free to choose
> "an underlying implementation based on run-time profiling without
> changing your client code", to quote a part of his justification.
No. While it is true that the spec is not explicit about this, it does
place certain constraints on hash tables. For example, CL hash tables
cannot under any circumstances assume that the keys have a total order,
so they cannot be implemented by any underlying data structure that
requires that the keys be ordered.
Furthermore, the existence of hash-table-rehash-size, and the
rehash-size and rehash-threshold arguments to make-hash-table doesn't
make any sense if the underlying implementation is anything other than a
hash table.
Finally (and only on CLL would one have to point this out) the fact that
these things are called "hash tables" very strongly implies that they
are in fact hash tables and not something else.
> Piling complication on top of this simple abstraction could make sense
> in the context of advanced application design, but is it good
> paedagogy for beginners?
Of course it is. Why would you doubt it?
rg
By the way, why do we abstract out hash-table but leave the list alone?
I don't think it is correct. Code needs to work with any enumerable data
container, not only lists, otherwise it will be half-assed and abstractions
won't be correct.
So I think we need to create a class for abstract data containers and some
enumeration functions on it (like foldl) so we can do this properly.
There is no such thing as a "correct abstraction."
If code works, it is correct.
Otherwise, we can put a shitload of abstractions on top of it -- but will it
make it better?
> * Bob Felts
>
> > You're arguing about LOC, Ron is arguing about correct abstraction.
>
> Read Chapter 18.1.1 of the CLHS: The CL hash-table itself is an
> abstraction.
So are integers, and complex numbers, b-trees and...
Maybe complex numbers could be used. It's an abstraction.
[...]
>
> Piling complication on top of this simple abstraction could make sense
> in the context of advanced application design, but is it good
> paedagogy for beginners?
Absolutely. In my opinion, one's first introduction to computers should
be about problem representation and abstraction of problem elements,
only then to be followed by programming languages.
A well-written story with incoherent plotting and dull characters won't
sell much, unless the special effects are great and/or you're resting on
your reputation.
> BF> You're arguing about LOC, Ron is arguing about correct abstraction.
>
> There is no such thing as a "correct abstraction."
> If code works, it is correct.
I know someone who would like to disagree with you. Perhaps she will be
familiar to you?
http://stablecross.com/files/Software_Plea_1.html
Correctness is much, much more than just "it works".
> Otherwise, we can put a shitload of abstractions on top of it -- but will it
> make it better?
Too much abstraction is as bad as too little abstraction is as bad as
the wrong abstraction.
(defun count-sorted-list (list)
(when list
(let ((element (car list))
(n 1))
(loop for next on (cdr list)
while (eql element (car next)) do (incf n)
finally (return (cons (cons element n)
(count-sorted-list next)))))))
(defun count-list (list)
(sort
(count-sorted-list (sort (copy-list list) #'string<))
#'>
:key #'cdr))
CL-USER> (count-list '(a b c f g a a c c a b ))
((A . 4) (C . 3) (B . 2) (F . 1) (G . 1))
CL-USER>
Wade
Thanks for all the replys!
I will need some time to read (and understand) them all.
Yours sincerely,
Eric
> You're arguing about LOC, Ron is arguing about correct abstraction.
Obviously I'm not arguing about LOC: I'm arguing about size (not in
bytes, cognitive size perhaps?) and complexity, and using LOC as a poor
proxy for that.
> I
> have to side with Ron on this one: getting the correct abstraction is
> paramount above everything else. Small LOC is good if the abstraction
> doesn't matter; it's easy to change programs with small LOC.
No. maintainable, correct, adequately performant programs are all that
matters. Abstraction may or may not help with that: we all know the
horrors that result when people make the assumption that it always does.
> Do you not expect Ron to understand it because you haven't explained it
> well, or because you think Ron doesn't have the mental horsepower to
> understand it?
I think I haven't explained it well, and I'm not really going to try to
do that here - if I had the energy to do so I'd be writing a book, not
posting news articles. Gabriel's patterns book is the best description
I know (and I note it's now available as a PDF), though it's a while
since I read it and I'm sure I disagree with him about many things.
However although it's not to do with "horsepower" (whatever that may
mean - surely no one really believes in the
single-number-describing-intelligence thing any more), it is to do with
style of thinking. I don't really have the energy to get involved in a
huge discussion about that, and for a long time I didn't really
understand that there were such enormous differences in style: mostly
because I didn't want to believe there were I think. What made me
change my mind was realising that there were people who were genuinely
good mathematicians (far better than me) but who just had no feel for
physics at all, and similarly that there were people who were really
quite good at physics who really could never get anywhere with number
theory, say (I'm one of these). There are other enormous distinctions
like this, and some of them are at work here. A couple of interesting
ones are the system-vs-bits one (lots of computer people are on the
right hand of that) and the closely-related meta one which is to do
with whether you understand this whole argument (again, lots of
computer people are on the bits side of that).
> No. While it is true that the spec is not explicit about this, it does
> place certain constraints on hash tables. For example, CL hash tables
> cannot under any circumstances assume that the keys have a total order,
> so they cannot be implemented by any underlying data structure that
> requires that the keys be ordered.
Of course they can. The implementation would just have to change
dynamically if it noticed that there was no longer a total order it
could use.
>
> This depends on how you count. LOC always depend on what you choose to
> count as code and what you choose to count as library. You are using
> LOOP, whose implementation is a helluva lot bigger than dictionaries.
Indeed it does. This is to do with the whole
application/library/language distinction which has been hashed out in
cll before. As should be clear my view on that is that the middle
ground is uninteresting: I want language-quality stuff (like LOOP) or
nothing.
See:
http://htdp.org/
How to Design Programs
An Introduction to Computing and Programming
Matthias Felleisen, Robert Bruce Findler,
Matthew Flatt, & Shriram Krishnamurthi
MIT Press (2003 edition)
-Rob
-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
> I'd use a plist because of the existence of (setf getf):
>
> (defun occurences-helper (list result)
> (if (null list)
> result
> (progn
> (incf (getf result (car list)))
> (occurences-helper (cdr list) result))))
Ah, I had to read about plists in ACL before I did understand
that.
> But really I'd iterate normally:
>
> (defun occurences-helper (list)
> (loop with result = nil
> for element in list
> do (incf (getf result element))
> finally (return result)))
Grahams riots about loop a little bit. He says that the specification of
loop in the ACL Standard would not be formal and so ..
I must admit its easier to read though.
> A different approach:
>
> (defun occurences (list)
> (sort (mapcar (lambda (element)
> (cons element
> (count element list)))
> (remove-duplicates list))
> #'>
> :key #'cdr))
I really like that one. Since I first read about closures in lua, I
asked myself, wether there really are situations where they help.
And here is one now!
Thanks.
Yours,
Eric
For a small list it wouldn't matter, but for a big list, your first
sort would increase your processing time by log(n).
Moreover, you need to copy the list, so it doubles the space needed.
Nothing that cannot be solved by putting more money on it, but if you
have a $1M super computer to process, perhaps you won't want to spend
$1M more for an extension.
Of course, if you're running it on a small list on a $50 computer, you
could indeed as well buy a $200 computer and have it run twice as
fast, you will hardly notice the difference.
--
__Pascal Bourguignon__ http://www.informatimago.com/
> Absolutely. In my opinion, one's first introduction to computers should
> be about problem representation and abstraction of problem elements,
> only then to be followed by programming languages.
In my opinion, learning should be enjoyable. Maybe that's just me,
but if I don't enjoy learning something, I won't do it well. So if my
_first_ introduction to computers was an abstract course without any
practical programming, I might have lost interest.
Fortunately for me, I am self-taught in programming, so it was easy to
negotiate the contents of the course with the instructor :-)
OTOH, I don't deny the usefulness of the things you mention, I just
don't think that they are that useful when one is first introduced to
computers.
Best,
Tamas
R> Of course it is. Why would you doubt it?
Your choice of abstractions are arbitrary.
You've abstracted out hash-table, but you didn't abstract out lists.
So your code is not consistent and it is a bad example. (Unless you're
willing to teach inconsistency.)
I wish I could disagree with you, but your point is sound. I'm just
overly grouchy from having to deal with code written by people who never
get past the "drawing outside the lines with crayons" stage.
Programmers are like American Idol contestents. The majority who think
they can sing simply can't.
> ??>> Piling complication on top of this simple abstraction could make sense
> ??>> in the context of advanced application design, but is it good
> ??>> paedagogy for beginners?
>
> R> Of course it is. Why would you doubt it?
>
> Your choice of abstractions are arbitrary.
No, it isn't. Abstract associative maps have a unique combination of
simplicity and power, not unlike cons cells. In fact, you can think of
an abstract associative map as a generalization of a cons cell with an
arbitrary number of cells that you can give arbitrary names.
> You've abstracted out hash-table, but you didn't abstract out lists.
So?
> So your code is not consistent and it is a bad example. (Unless you're
> willing to teach inconsistency.)
Huh? I have no idea what you mean by that.
rg
If you're willing to seriously posit a system smart enough to do that
then there is no need for anything BUT "hash tables" (in scare quotes
now because they may or may not be implemented as hash tables). You
don't need, say, cons cells any more because the system would be smart
enough to figure out that a "hash table" with two keys "is" a cons cell
(particularly if those two keys happen to be the symbols CAR and CDR --
or is that FIRST and REST?) You don't need vectors because the system
could be smart enough to figure out that if the keys are a dense subset
of the integers (or, even better, that there exists a mapping from the
keys onto a dense subset of the integers) that the resulting "hash
table" "is" a vector. And so on.
It's not impossible, of course, but no existing CL implementation does
this, and I think it's a pretty safe bet that none ever will.
On the other hand, if you provide abstract associative maps as a
user-level abstraction then you have the flexibility to change the API
so that the *user* can specify (and more to the point, modify) the
underlying implementation without having to change client code. This
seems to me to be the more practical approach.
rg
Not necessarily. I think what FSet does fits Tim's description,
though it may not be exactly what he had in mind. FSet uses binary
trees and requires an ordering of elements, but it has a notion of
"ordering collision"; when two or more items in the same set collide
in the ordering, FSet falls back to putting those items in a list
rather than a tree. As long as collisions are relatively rare, the
performance impact is small.
Thus it is perfectly viable to use comparison of integer hash values
to generate the ordering. Performance is then dependent on the
quality of the hash function, just as it is for a hash table.
> On the other hand, if you provide abstract associative maps as a
> user-level abstraction then you have the flexibility to change the API
> so that the *user* can specify (and more to the point, modify) the
> underlying implementation without having to change client code. This
> seems to me to be the more practical approach.
This part I agree with, of course. (Except that I think the word
"associative" is redundant; just call the thing a map.)
-- Scott
> Not necessarily. I think what FSet does fits Tim's description,
> though it may not be exactly what he had in mind. FSet uses binary
> trees and requires an ordering of elements, but it has a notion of
> "ordering collision"; when two or more items in the same set collide
> in the ordering, FSet falls back to putting those items in a list
> rather than a tree. As long as collisions are relatively rare, the
> performance impact is small.
That's the kind of thing I was thinking of. It also seems to me that
systems can probably exploit ordering constraints that they know but
you can't: for instance an implementation might arrange life so that
the ordering of addresses of objects does not change, or changes only
very rarely, thus allowing it to impose a total ordering on things.
I'd certainly expect systems to implement "hashtables" with
increasingly sophisticated algorithms as things like locality become
more important.
> OTOH, I don't deny the usefulness of the things you mention, I just
> don't think that they are that useful when one is first introduced to
> computers.
I think this is a bit like teaching mathematics. If you started off
teaching people about fields[*] you're going to have a catastrophe
(tragically, there is quite good data about this sort of thing,
following the "new mathematics" teaching movement in the 60s and 70s: I
don't think they started with fields, but they did do a lot of set
theory and group theory).
Of course, these concepts are terribly important - you can't get very
far in theoretical physics without a good grounding in group theory and
differential geometry - but that doesn't mean you should turn things
inside out, because you also can't get very far in theoretical physics
without the kind of aggressive numeracy which you only get by just
spending a lot of time playing with sums. And most people who aren't
theoretical physicists can get by just fine without group theory and
differential geometry, but would be helped enormously by being a lot
more numerate.
There's an obvious analogy to teaching programming here (though the
situation is worse there than it is with maths for various reasons).
--tim
[*] Don't pick me up on this: my memory is that a field is the right
model for reals under the normal operations, but I may be wrong - pick
the right model if so.
Heh, heh, you need a *lot* more these days than just those two,
as this excellent multi-part[1] cartoon points out: ;-}
http://abstrusegoose.com/272
Prerequisites
-Rob
[1] This is a 7-part cartoon -- you have to click the image
(*not* the "Next" button!) to advance to each part.
> On 2010-06-19 15:42:50 +0100, Tamas K Papp said:
>
> > OTOH, I don't deny the usefulness of the things you mention, I just
> > don't think that they are that useful when one is first introduced to
> > computers.
>
> I think this is a bit like teaching mathematics. If you started off
> teaching people about fields[*] you're going to have a catastrophe
> (tragically, there is quite good data about this sort of thing,
> following the "new mathematics" teaching movement in the 60s and 70s: I
> don't think they started with fields, but they did do a lot of set
> theory and group theory).
There's a lot of data from programming as well. Languages that offer
abstract maps as core constructs (e.g. Python, Javascript, PHP) seem to
get more traction with beginners more than those that don't.
rg
You would? Are there any CL implementations that actually do this? Why
does the API expose hash-table parameters but not parameters for any
other kind of underlying implementation? For that matter, why are they
called hash tables if the intent is not for the underlying
implementation to be a hash table? If I want an actual hash table that
is guaranteed to respect the size, rehash-size and rehash-threshold
parameters, do I have to write my own?
rg
You certainly do: "The values of rehash-size and rehash-threshold do
not constrain the implementation to use any particular method for
computing when and by how much the size of hash-table should be
enlarged. Such decisions are implementation-dependent, and these values
only hints from the programmer to the implementation, and the
implementation is permitted to ignore them."
Worth reading the spec before posting, I find.
> There's a lot of data from programming as well. Languages that offer
> abstract maps as core constructs (e.g. Python, Javascript, PHP) seem to
> get more traction with beginners more than those that don't.
Of course! Now I understand! JavaScript's popularity is due to
*abstract maps*, and nothing at all to do with it being built into
every web browser for the last 12 years. How foolish I have been!
And since I now have a copy of it (sadly, Erik's copy) I'll go and read
that: arguing against Erann is like fighting porridge.
Worth reading the question before getting smug. Where did I specify
that I wanted this behavior to be portable?
The correct answer to my question is no, I do not have to write my own,
I just have to find an implementation that doesn't avail itself of the
spec's permission to ignore these parameters.
rg
I do not want to start an argument with a man who is no longer here to
hold up his end of it, but since you chose to bring it up I will say
that ad hominems are the last refuge of the scoundrel trying to defend
an untenable position.
I'll also say, since you chose to bring it up, that arguing with people
here on CLL is often like arguing against creationists. Both will
highlight certain facts while ignoring other facts, and freely
context-switch between different sets of tacit assumptions, anything to
avoid admitting even the slightest defect in their holy text (Genesis in
the case of creationists, the ANSI spec in the case of Lispers).
And then, of course, when backed into a corner both will resort to
insults.
rg
In another branch of this post I remarked that arguing with certain
people here on CLL is like arguing with creationists. Here is a perfect
example. I observe that languages with abstract maps are more popular
with beginners than languages without them, and advance the hypothesis
that there might be some kind of causal connection. The response is to
arbitrarily zero in on one of three concrete data points, picking some
OTHER factor that might also have contributed to this particular
languages popularity (which, it is worth noting, doesn't apply to the
other two data points, the ones that are being quietly ignored), and
then implying that not only does this refute the original hypothesis,
but that it does so so fully and completely that one doesn't even have
to advance the argument, that mere ridicule of the other side suffices
to make the point. There are more logical fallacies here than I have
the patience to enumerate.
rg
> There are more logical fallacies here than I have the patience to
> enumerate.
Maybe a correct impelementation of the right abstraction would help you
get it done within your patience threshold?
By the way, nothing prevents us to provide browsers including ECL (and
web servers providing CL scripts), apart perhaps the lack of time to
do it...
There are at least four variants possible:
1. concrete hash table and concrete list;
2. abstract associative container and conrete list;
3. concrete hash table and abstract sequence;
4. abstract hash table and abstract sequence.
Particularly, let's consider fourth option.
It could be something like:
(require :dictionary)
(require :abtract-sequence)
(defclass counter (dictionary) ())
(defclass sequence (abstract-sequence) ())
(defmethod add ((c counter) item)
(setf (ref c item) (1+ (ref c item 0))))
(defmethod count-items (sequence)
(let ((c (make-instance 'counter)))
(iterate-abstract-sequence sequence
(lambda (item) (add c item)))))
I don't see a good reason why you prefered 2 to 3 or 4.
So this was an arbitrary choice.
And now you're preaching about that.
This probably means that you do not understand that each abstraction is a
tradeoff.
And if you do not understand this, you're not the one to speak about
"paedogogy".
Once again, "Patterns of Software" by R.P.G. is a recommended reading for
you:
http://www.dreamsongs.com/Files/PatternsOfSoftware.pdf
That will help to get rid of naivety.
??>> So your code is not consistent and it is a bad example. (Unless you're
??>> willing to teach inconsistency.)
R> Huh? I have no idea what you mean by that.
That's because you're retarded.
Common Lisp's hash tables are as abstract as those.
R> seem to get more traction with beginners more than those that don't.
Traction with beginners is not what programming languages should be
optimized for.
> BF> You're arguing about LOC, Ron is arguing about correct abstraction.
>
> There is no such thing as a "correct abstraction."
> If code works, it is correct.
until you try to change it...
The beginning of wisdom for a [software engineer] is to recognize the
difference between getting a program to work, and getting it right.
-- M A Jackson, 1975
> Otherwise, we can put a shitload of abstractions on top of it -- but will it
> make it better?
...or until unanticipated inputs arrive.
>
> The beginning of wisdom for a [software engineer] is to recognize the
> difference between getting a program to work, and getting it right.
> -- M A Jackson, 1975
http://smuglispweeny.blogspot.com/2008/03/tlop-worst-thing-you-can-say-about.html
Here is a case where clean software architecture actually unearthed a
mistaken business process:
http://smuglispweeny.blogspot.com/2008/03/we-can-live-with-way-you-handled-that.html
Here is a case where simply improving the data names revealed that a
program didn't work:
http://smuglispweeny.blogspot.com/2008/07/aa-bb-cc-and-dd.html
kt
--
http://www.stuckonalgebra.com
"The best Algebra tutorial program I have seen... in a class by itself."
Macworld
> Tim Bradshaw <t...@tfeb.org> wrote:
>
>> On 2010-06-18 17:59:32 +0100, RG said:
>>
>> > dictionary.lisp is only 295 LOC including comments and blank lines. Even
>> > if you add in rg-utils (only a tiny part of which is used by
>> > dictionary.lisp) that's only an addition 970 LOC. That is not "huge" by
>> > any stretch of the imagination.
>>
>> It's about two orders of magnutude larger than simple versions of
>> occurrence-counting funcions using alists or hashtables (both of which
>> come out at around 7 lines without comments, so maybe 9 with):
>
> You're arguing about LOC, Ron is arguing about correct abstraction. I
> have to side with Ron on this one: getting the correct abstraction is
> paramount above everything else. Small LOC is good if the abstraction
> doesn't matter; it's easy to change programs with small LOC.
"The correct abstraction", in the case that you want to be able to
easily change the underlying implementation, is two or three functions
which govern access to the underlying hash table. These you can write on
a per application basis. No large library is needed. This approach has
two advantages: your accessors can have names meaningful to your
application, and they can be designed to better suit the access pattern
of it.
A generic, complex library is very likely not "the correct abstraction"
for any particular application.
I wonder if this is not an issue of reusing code for the purpose of
reusing design. I would argue that code is interesting to reuse if it
performs something useful, and that design is better done in the
application itself.
Björn Lindberg
> Eric Wolf <er...@boese-wolf.eu> writes:
>
> > Hello there!
> >
> > I'm starting to learn lisp and doing the exercises of Ansi Common
> > Lisp authored by Paul Graham.
> >
> > So my solution to counting occurences in a list and outputting a
> > list of dotted pairs (symbol . count) sorted by count is
> >
> > (defun occurences-helper (lst result)
> > (if (null lst)
> > result
> > (progn
> > (let ((symb (car lst)))
> > (let ((test (assoc symb result)))
> > (if test
> > (setf (cdr test) (+ (cdr test) 1))
> > (setf result (cons (cons symb 1) result)))))
> > (occurences-helper (cdr lst) result))))
> >
> > (defun occurences (lst)
> > (sort
> > (occurences-helper lst ())
> > #'(lambda (x y)
> > (>
> > (cdr x)
> > (cdr y))))
> > )
> >
> >
> > And I thought: "Hmm that doesn't look elegant" but I'm not used to
> > recursion and functional programming and that stuff so I didn't
> > came up with a better version.
> >
> > Maybe some experienced lisp hacker may provide a different one?
>
> SORT takes both a comparison predicate, and a key function. This let
> you avoid building an anonymous function.
>
> You can avoid the helper function by using an optional parameter.
>
> But I would keep the helper function since the signature of OCCURENCES
> doesn't involve an optional parameter per se, and it's often more
> efficient to have functions with fixed number of parameters. Just put
> it as a local function using LABELS.
>
> The best advice, is to avoid SETF. Try to think functionnaly. Of
> course, Common Lisp is multi-paradigm, and we don't refuse an
> occasional state modification, as long as it's of limited scope, and
> the alternative would be more costly and less readable. So I still
> use INCF to increment the counter, but this is done on a data
> structure that is being built by the count-occurences function, and
> the alternative would be to cons the backbone of the a-list again and
> again or using even more complex functional data structures.
>
>
>
> (defun occurences (list &optional counts)
> (if (null list)
> (sort counts (function >) :key (function cdr))
> (let ((pair (assoc (first list) counts)))
> (if (null pair)
> (occurences (rest list) (acons (first list) 1 counts))
> (progn
> (incf (cdr pair))
> (occurences (rest list) counts))))))
>
>
>
> (defun occurences (list)
> (labels ((count-occurences (list counts)
> (if (null list)
> (sort counts (function >) :key (function cdr))
> (let ((pair (assoc (first list) counts)))
> (if (null pair)
> (count-occurences (rest list) (acons (first
> list) 1 counts)) (progn
> (incf (cdr pair))
> (count-occurences (rest list) counts)))))))
> (count-occurences list '())))
>
>
> (occurences '(a b c a b c a b a))
> --> ((A . 4) (B . 3) (C . 2))
>
>
>
> Of course, in Common Lisp, we would just use an interation construct:
>
> (defun occurences (list)
> (loop
> :with counts = '()
> :for item :in list
> :for pair = (assoc item counts)
> :if pair
> :then (incf (cdr pair))
> :else (push (cons item 1) counts)
> :finally (return (sort counts (function >) :key (function cdr)))))
>
Clojure.
Using group-by:
(defn count-items [list]
(sort-by #(second %)
(map (fn[[k v]] [k (count v)])
(group-by identity list))))
Using reduce:
(defn count-items2 [list]
(sort-by #(second %)
(reduce #(merge-with + %1 {%2 1}) {} list)))
> A good compiler would generate the same code for both sources, so it's
> up to you to see which is clearer.
Guile Scheme:
(use-modules (srfi srfi-69)) ; hash-tables
(use-modules (srfi srfi-26)) ; currying (cut)
(define (with f g)(lambda (a b) (f (g a) (g b))))
(define (occur lst)
(let ((h (make-hash-table)))
(for-each
(cut hash-table-update!/default h <> 1+ 0)
lst)
(sort (hash-table->alist h) (with > cdr))))
Common Lisp:
(defun distribution (seq &key (test 'eql) (sort-pred #'<))
(let ((table (make-hash-table :test test)) result)
(map nil (lambda (x) (incf (gethash x table 0))) seq)
(maphash (lambda (k v)
(setf result (merge 'list (list (cons k v)) result
sort-pred :key #'cdr)))
table)
result))
> (distribution '(a b c f g a a c c a b) :sort-pred '>)
((A . 4) (C . 3) (B . 2) (F . 1) (G . 1))
Common Lisp:
(defun distribution (seq &key (test 'eql) (sort-pred #'<))
(let ((table (make-hash-table :test test)) result)
(map nil (lambda (x) (incf (gethash x table 0))) seq)
(maphash (lambda (k v)
(push (cons k v) result))
table)
(sort result sort-pred :key #'cdr)))
CL-USER 88 > (let ((list (loop for i from 0 below 1000000 collect
(random 100)))) (time (distribution list)))
Timing the evaluation of (DISTRIBUTION LIST)
User time = 0.330
System time = 0.001
Elapsed time = 0.328
Allocation = 16800 bytes
0 Page faults
((70 . 9791) (79 . 9812) (99 . 9823) (87 . 9833) (65 . 9836) (2 .
9858) (75 . 9862) (35 . 9864) (71 . 9866) (9 . 9882) (94 . 9884) (10 .
9900) (97 . 9901) (36 . 9901) (12 . 9902) (72 . 9910) (6 . 9912) (54 .
9916) (48 . 9916) (60 . 9917) (23 . 9919) (46 . 9926) (47 . 9932)
(83 . 9934) (25 . 9934) (73 . 9937) (38 . 9940) (18 . 9942) (19 .
9943) (90 . 9944) (52 . 9945) (29 . 9947) (13 . 9947) (0 . 9947) (44 .
9951) (96 . 9953) (37 . 9961) (39 . 9966) (16 . 9966) (84 . 9970)
(86 . 9971) (57 . 9972) (42 . 9973) (20 . 9973) (45 . 9974) (88 .
9978) (68 . 9979) (56 . 9981) (5 . 9990) (93 . 9991) (51 . 9992) (69 .
9995) (17 . 10002) (11 . 10002) (64 . 10003) (53 . 10005) (85 . 10006)
(40 . 10006) (4 . 10010) (8 . 10011) (80 . 10012) (76 . 10012) (30 .
10012) (66 . 10014) (59 . 10019) (22 . 10020) (74 . 10025) (3 . 10026)
(58 . 10029) (7 . 10033) (32 . 10034) (62 . 10037) (21 . 10046) (24 .
10049) (77 . 10050) (27 . 10069) (1 . 10071) (14 . 10077) (43 . 10080)
(98 . 10082) (49 . 10092) (67 . 10105) (26 . 10105) (82 . 10108) (33 .
10117) (50 . 10121) (61 . 10123) (81 . 10124) (92 . 10129) (91 .
10134) (55 . 10135) (28 . 10139) (41 . 10147) (89 . 10148) (63 .
10158) (15 . 10180) (95 . 10195) (78 . 10207) (31 . 10241) (34 .
10291))
CL-USER 89 > (let ((list (loop for i from 0 below 1000000 collect
(random 100000)))) (time (distribution list)))
Timing the evaluation of (DISTRIBUTION LIST)
User time = 0.632
System time = 0.005
Elapsed time = 0.628
Allocation = 9218176 bytes
435 Page faults
((98313 . 1) (96595 . 1) (93860 . 1) ...)
Exactly! Running sort once is faster than running merge n times.
Arc:
> (tablist (counts '(a b c a b c a b a)))
((a 4) (b 3) (c 2))