conciseness is power

Greg Bacon

unread,

Dec 23, 2005, 1:38:18 PM12/23/05

to

As an exercise, I wrote code to find anagrams in a one-word-per-line
dictionary. See http://blogs.pragprog.com/cgi-bin/pragdave.cgi/
Practices/Kata/KataSix.rdoc. Within the equivalence classes, I sorted
the words in ascending order and sorted classes in descending order of
size. See the code below:

(defun anagram-list (hash)
(let ((anagrams '()))
(maphash (lambda (normal words)
(when (>= (length words) 2)
(push words anagrams)))
hash)
anagrams))

(defun find-anagrams (path)
(let ((anagrams (make-hash-table :test #'equal)))
(with-open-file (s path)
(loop for word = (read-line s nil)
while word do
(push word (gethash (normal-form word) anagrams))))
(anagram-list anagrams)))

(defun chars-of (word)
(loop for i from 0 below (length word) collect (char word i)))

(defun normal-form (word)
(let* ((chars (chars-of (string-downcase word)))
(normal-order (sort chars #'char-lessp)))
(coerce normal-order 'string)))

(defun longest-first (a b)
(let ((lena (length a))
(lenb (length b)))
(cond ((> lena lenb) t)
((< lena lenb) nil)
(t (loop for aa in a
for bb in b do
(when (string-lessp bb aa) (return nil))
t)))))

(let ((anagrams (find-anagrams #P"words721.txt")))
(loop for group in anagrams do
(setf group (sort group #'string-lessp)))
(loop for group in (sort anagrams #'longest-first) do
(format t "~{~a~^ ~}~%" group)))

For comparison, I wrote Perl code to do the same task:

#! /usr/bin/perl

sub longest_first {
my $bylen = scalar(@$b) <=> scalar(@$a);

if ($bylen) {
return $bylen;
}
else {
my $byalpha;
for (0 .. $#$a) {
$byalpha = $a->[$_] cmp $b->[$_];
return $byalpha if $byalpha;
}
0;
}
}

@ARGV = "words721.txt" unless @ARGV;

my %group;
while (<>) {
s/\s+\z//;

push @{ $group{ join "", sort split //, $_ } } => $_;
}

my @anagrams = map [ sort @$_ ], grep @$_ >= 2, values %group;

for (sort longest_first @anagrams) {
print join(" " => @$_), "\n";
}

The sort comparator is syntactically awkward and bulky, but in the other
areas, the Perl code seems more concise.

Please understand that I don't mean this as yetanotherflame. I'm sure
my Lisp code is far too verbose, and I'm looking to learn from
suggestions on how I might improve it.

Thanks,
Greg
--
Liberty means responsibility. That is why most men dread it.
-- George Bernard Shaw

Marco Antoniotti

unread,

Dec 23, 2005, 1:58:09 PM12/23/05

to

Very quickly (and possibly wrong)

Greg Bacon wrote:
> As an exercise, I wrote code to find anagrams in a one-word-per-line
> dictionary. See http://blogs.pragprog.com/cgi-bin/pragdave.cgi/
> Practices/Kata/KataSix.rdoc. Within the equivalence classes, I sorted
> the words in ascending order and sorted classes in descending order of
> size. See the code below:
>
> (defun anagram-list (hash)
> (let ((anagrams '()))
> (maphash (lambda (normal words)
> (when (>= (length words) 2)
> (push words anagrams)))
> hash)
> anagrams))

(defun anagram-list (hash)
(loop for words being the hash-values of hash
when (>= (length words) 2) collect words))

>
> (defun find-anagrams (path)
> (let ((anagrams (make-hash-table :test #'equal)))
> (with-open-file (s path)
> (loop for word = (read-line s nil)
> while word do
> (push word (gethash (normal-form word) anagrams))))
> (anagram-list anagrams)))
>
> (defun chars-of (word)
> (loop for i from 0 below (length word) collect (char word i)))
>
> (defun normal-form (word)
> (let* ((chars (chars-of (string-downcase word)))
> (normal-order (sort chars #'char-lessp)))
> (coerce normal-order 'string)))

(defun normal-form (word)
(sort (copy-seq word) #'char-lessp))

>
> (defun longest-first (a b)
> (let ((lena (length a))
> (lenb (length b)))
> (cond ((> lena lenb) t)
> ((< lena lenb) nil)
> (t (loop for aa in a
> for bb in b do
> (when (string-lessp bb aa) (return nil))
> t)))))

(defun longest-first (a b)
(let ((lena (length a))
(lenb (length b)))

(or (> lena lenb)
(and (= lena lenb)
(every (complement #'string-lessp) a b)))))

Cheers
--
Marco

Kenny Tilton

unread,

Dec 23, 2005, 2:02:01 PM12/23/05

to

(defun normal-form (word)
(sort word #'char-lessp))

Just for one. I did not look that hard at other code.

I am sure it is. Conciseness is a Perl Prime Directive. Lisp is
naturally concise, but avoids syntax by making things like DEFUN
explicit, so in small examples it can look bloated.

>
> Please understand that I don't mean this as yetanotherflame.

Well then watch out for sound bites like "conciseness is power". To a
point yes. Beyond that point lies obscurity. The K language (an APL-like
tool) really does make conciseness the Prime Directive. You should see
their one-page spreadsheet application. It is a howler, but they are
serious.

I mean, we just have to write this stuff, not read it. And even with
COBOL there just is not that much to type.

kenny

Brian Downing

unread,

Dec 23, 2005, 2:25:58 PM12/23/05

to

In article <JQXqf.15067$Ed....@news-wrt-01.rdc-nyc.rr.com>,
Kenny Tilton <NOktil...@nyc.rr.com> wrote:

> Greg Bacon wrote:
> > (defun normal-form (word)
> > (let* ((chars (chars-of (string-downcase word)))
> > (normal-order (sort chars #'char-lessp)))
> > (coerce normal-order 'string)))
>
> (defun normal-form (word)
> (sort word #'char-lessp))
>
> Just for one. I did not look that hard at other code.

You missed a string-downcase, and you'll be destroying all the original
words, which is bad since he still needs them:

(defun normal-form (word)
(sort (nstring-downcase (copy-seq word)) #'char-lessp))

-bcd
--
*** Brian Downing <bdowning at lavos dot net>

Kaz Kylheku

unread,

Dec 23, 2005, 2:34:15 PM12/23/05

to

Conciseness is power, when it is achieved by actually removing
irrelevant details so that you can express the computational ideas more
directly. The Perl code does this for some things here. For instance
while (<>) hides a lot of logic. Looping over file names, opening the
files, and reading lines.

The Common Lisp language doesn't have this operator.

But you see, it doesn't have to. If you want that operator, you can
make it yourself.

If I made it myself, I probably wouldn't design it that way because it
relies on a global variable, namely the @ARGV. I would pass in the list
of files as a sequence. Or I could have it use some default list named
by a special variable. The programmer could override this special
variable by binding it to something else around the invocation of the
file-munging construct.

I could also design the file munging construct so that it can nest with
itself, by not using a fixed variable name for the line that is read:

(defmacro read-lines-from-files (line-string-var file-list-form) ...)

(read-lines-from-files (word '("words721.txt"))

(push word (gethash (normal-form word) anagrams))

This is as concise as I would like it. It is /semantically/ concise and
clean. I deliberately give the macro a descriptive name, so people can
guess what it is without having to decipher its internals or read any
documentation.

Any further terseness, by designing cryptic nonsense like while (<>),
or "cheating" by using global variables and fixed variable bindings, is
bad design.

Also note how some of your Lisp is more verbose simply because you
broke up the computation and gave meaningful names to intermediate
values.

In the Perl, you have a nested sort and join thing over the characters
of the string.

In the Lisp code, I learn that to form the keys, words are transformed
into something you call a "normal form", which is formed by folding to
lower case and sorting the characters to a "normal order".

These little tidbits serve as documentation which give me a vocabulary
for talking about the program and help me understand what it does
quicker.

Even if I didn't know what STRING-DOWNCASE does, if it wasn't a
standard Lisp function but defined somewhere in your program, I could
probably guess what it does.

I only know that your Perl program performs string downcasing because I
have taken your your word for it that it is equivalent to the Lisp
one!!!

I'm guessing that it's done by:

s/\s+\z//;

I would never guess what this does.

Lastly, compare only the "longest first" functions only between the two
programs. Perl doesn't have any substantial tricks which helped there.
There is that cute <=> thing which computes -1, 0 or 1.

Thtt <=> operator inspires this idea: define a macro operator in Lisp
which evaluates two expressions, and then based on the result of
comparing their values, evaluates one of three clauses:

(defmacro less-greater-equal-case (left-form right-form less-form
greater-form equal-form)
...)

Then you could do this:

(defun longest-first (a b)
(less-greater-equal-case (length a) (length b)
nil t
(loop
for aa in for bb in b do
when (string-lessp bb aa)
return nil
finally return t)))

Note, by the way, how I changed your LOOP slightly to take advantage of
the WHEN and FINALLY clauses.

Of course, the length of the LESS-GREATER-EQUAL-CASE operator counts
against the length of the program. Or should it? Is that fair? You can
put that into a library and reuse it.

Brian Downing

unread,

Dec 23, 2005, 2:42:39 PM12/23/05

to

In article <11qoh0q...@corp.supernews.com>,

Greg Bacon <gba...@hiwaay.net> wrote:
> As an exercise, I wrote code to find anagrams in a one-word-per-line
> dictionary. See http://blogs.pragprog.com/cgi-bin/pragdave.cgi/
> Practices/Kata/KataSix.rdoc. Within the equivalence classes, I sorted
> the words in ascending order and sorted classes in descending order of
> size. See the code below:

I think in general you're going to have a hard time beating Perl for
conciseness in a one-off text-mangling excercise - the language was
designed for precicely that! Nevertheless, people have shown you how to
trim your CL code down a bit.

Actually, your code didn't even work for me. What implementation did you
try it on?

I'll point out some problems:

> (loop for aa in a
> for bb in b do
> (when (string-lessp bb aa) (return nil))
> t)

LOOP doesn't return the last form in a DO section. My compiler signaled
an error on this.

> (loop for group in anagrams do
> (setf group (sort group #'string-lessp)))

This does not do what you expect. Setting group is just setting a local
variable, not propagating it back into the list. Furthermore, your call
to SORT is probably destroying the source list. If you want this exact
metaphor, do:

(loop for group on anagrams do
(setf (car group) (sort (car group) #'string-lessp)))

but it's probably better to use a map like you did in your perl code.

Kenny Tilton

unread,

Dec 23, 2005, 3:08:38 PM12/23/05

to

Brian Downing wrote:
> In article <JQXqf.15067$Ed....@news-wrt-01.rdc-nyc.rr.com>,
> Kenny Tilton <NOktil...@nyc.rr.com> wrote:
>
>>Greg Bacon wrote:
>>
>>> (defun normal-form (word)
>>> (let* ((chars (chars-of (string-downcase word)))
>>> (normal-order (sort chars #'char-lessp)))
>>> (coerce normal-order 'string)))
>>
>>(defun normal-form (word)
>> (sort word #'char-lessp))
>>
>>Just for one. I did not look that hard at other code.
>
>
> You missed a string-downcase, and you'll be destroying all the original
> words, which is bad since he still needs them:

oops.

>
> (defun normal-form (word)
> (sort (nstring-downcase (copy-seq word)) #'char-lessp))

Ah, but both my mistakes can be cured with one stone...

(defun normal-form (word)
(sort (string-downcase word) #'char-lessp))

kt

Pascal Bourguignon

unread,

Dec 23, 2005, 3:42:44 PM12/23/05

to

gba...@hiwaay.net (Greg Bacon) writes:

> As an exercise, I wrote code to find anagrams in a one-word-per-line
> dictionary. See http://blogs.pragprog.com/cgi-bin/pragdave.cgi/
> Practices/Kata/KataSix.rdoc. Within the equivalence classes, I sorted
> the words in ascending order and sorted classes in descending order of
> size. See the code below:

> [...]

> For comparison, I wrote Perl code to do the same task:
>
> #! /usr/bin/perl
>
> sub longest_first {
> my $bylen = scalar(@$b) <=> scalar(@$a);
>
> if ($bylen) {
> return $bylen;
> }
> else {
> my $byalpha;
> for (0 .. $#$a) {
> $byalpha = $a->[$_] cmp $b->[$_];
> return $byalpha if $byalpha;
> }
> 0;
> }
> }
>
> @ARGV = "words721.txt" unless @ARGV;
>
> my %group;
> while (<>) {
> s/\s+\z//;
>
> push @{ $group{ join "", sort split //, $_ } } => $_;
> }
>
> my @anagrams = map [ sort @$_ ], grep @$_ >= 2, values %group;
>
> for (sort longest_first @anagrams) {
> print join(" " => @$_), "\n";
> }

(defpackage :words (:use))
(format
t "~{~{~A~^ ~}~%~}"
(let ((table (make-hash-table)))
(mapcar
(lambda (k) (sort (gethash k table) (function string-lessp)))
(sort (delete-duplicates
(with-open-file (words "/usr/share/dict/words")
(loop :for w = (read-line words nil nil)
:for k = (intern (sort (string-upcase w)
(function char<=)) :words)
:while w :do (push w (gethash k table nil)) :collect k)))
(lambda (a b) (>= (length (string a)) (length (string b))))))))

perl Region has 32 lines, 534 characters
lisp Region has 13 lines, 613 characters

But:

-rw------- 1 pjb pjb 318 2005-12-23 21:36 a.lisp.gz
-rw------- 1 pjb pjb 345 2005-12-23 21:09 a.perl.gz
^^^

However the question you must ask yourself is whether you just wanted
to produce this output, or whether you need to explicit the data
structures you've used to be able to get the anagrams of a given word,
or add new words later, etc, in which case some sub-sexp of my
"one-liner" should be extracted and defun'ed as you did.

--
__Pascal Bourguignon__ http://www.informatimago.com/
In deep sleep hear sound,
Cat vomit hairball somewhere.
Will find in morning.

Brian Downing

unread,

Dec 23, 2005, 4:02:37 PM12/23/05

to

In article <aPYqf.23771$i1....@news-wrt-01.rdc-nyc.rr.com>,

Not quite, though. (I thought about that solution too):

For string-upcase, string-downcase, and string-capitalize,
string is not modified. However, if no characters in string
require conversion, the result may be either string or a copy of
it, at the implementation's discretion.

http://www.lispworks.com/documentation/HyperSpec/Body/f_stg_up.htm

:-)

-bcd

David Steuber

unread,

Dec 23, 2005, 4:15:32 PM12/23/05

to

?

--
http://www.david-steuber.com/
The UnBlog: An island of conformity in a sea of quirks.
http://www.david-steuber.com/snippets/Boycott_Sony/

Ulrich Hobelmann

unread,

Dec 23, 2005, 5:09:20 PM12/23/05

to

Kaz Kylheku wrote:
> I only know that your Perl program performs string downcasing because I
> have taken your your word for it that it is equivalent to the Lisp
> one!!!
>
> I'm guessing that it's done by:
>
> s/\s+\z//;
>
> I would never guess what this does.

It looks like a regular expression substitution command of a typical
Unix editor (like vi or sed). I'd guess it to mean not string
downcasing, but to substitute a positive number of occurrences (+) of
'\s' and '\z' with the empty string. I don't know Perl, but I've never
encountered any regexps with \s or \z so I have no idea what they mean.
Certainly doesn't look like standard stuff.

Of the rest of the Perl I don't understand a thing.

So it's cryptic even for someone who used to like Unix tools, like me.
Compared to Perl, even bash syntax looks like heaven.

--
I wake up each morning determined to change the world...
and also to have one hell of a good time.
Sometimes that makes planning the day a little difficult.
E.B. White

Frank Buss

unread,

Dec 23, 2005, 5:21:11 PM12/23/05

to

Not very concise, but another solution:

(defun anagrams ()
(with-open-file (file "c:/tmp/wordlist.txt" :direction :input)
(let ((list (sort (loop for word = (read-line file nil) while word
collect (cons (sort (nstring-downcase (copy-seq word))
#'char-lessp)
word))
#'(lambda (cons1 cons2 &aux (word1 (car cons1)) (word2 (car cons2)))
(let ((length1 (length word1))
(length2 (length word2)))
(if (= length1 length2)
(string< word1 word2)
(> length1 length2)))))))
(loop for ((norm1 . word1) (norm2 . word2)) on list
if (equal norm1 norm2) collect word1 into words else when (> (length words) 1) do
(format t "~{~a~^ ~}~%" words)
(setf words '())))))

--
Frank Buß, f...@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

Marcin 'Qrczak' Kowalczyk

unread,

Dec 23, 2005, 6:20:27 PM12/23/05

to

Pascal Bourguignon <sp...@mouse-potato.com> writes:

> (defpackage :words (:use))
> (format
> t "~{~{~A~^ ~}~%~}"
> (let ((table (make-hash-table)))
> (mapcar
> (lambda (k) (sort (gethash k table) (function string-lessp)))
> (sort (delete-duplicates
> (with-open-file (words "/usr/share/dict/words")
> (loop :for w = (read-line words nil nil)
> :for k = (intern (sort (string-upcase w)
> (function char<=)) :words)
> :while w :do (push w (gethash k table nil)) :collect k)))
> (lambda (a b) (>= (length (string a)) (length (string b))))))))

Here is a version in my language Kogut (translated from Perl):

It's 2.5 times faster than Perl.

> perl Region has 32 lines, 534 characters
> lisp Region has 13 lines, 613 characters

Kogut Region has 11 lines, 365 characters

Lisp version folds case though, while Perl and Kogut don't.
Case folding can be done by replacing 'Sort word' by
'Sort (IgnoreCase word)', so it doesn't change much;
the run time is unaffected.

> But:
>
> -rw------- 1 pjb pjb 318 2005-12-23 21:36 a.lisp.gz
> -rw------- 1 pjb pjb 345 2005-12-23 21:09 a.perl.gz
> ^^^

This is unfair. When reading and writing code it's the raw size which
matters, not some entropy amount. Anyway, gzipped Kogut has 269 bytes.

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

Pascal Bourguignon

unread,

Dec 23, 2005, 6:51:53 PM12/23/05

to

Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> writes:
>> perl Region has 32 lines, 534 characters
>> lisp Region has 13 lines, 613 characters
>
> Kogut Region has 11 lines, 365 characters
>
> Lisp version folds case though, while Perl and Kogut don't.
> Case folding can be done by replacing 'Sort word' by
> 'Sort (IgnoreCase word)', so it doesn't change much;
> the run time is unaffected.
>
>> But:
>>
>> -rw------- 1 pjb pjb 318 2005-12-23 21:36 a.lisp.gz
>> -rw------- 1 pjb pjb 345 2005-12-23 21:09 a.perl.gz
>> ^^^
>
> This is unfair. When reading and writing code it's the raw size which
> matters, not some entropy amount. Anyway, gzipped Kogut has 269 bytes.

It is not unfair, because you don't read bits or bytes. You read
words, or even patterns, such as (sort ... (lambda (a b) ...)) where
you only read <sort> and the two '...'. So it's fair to compress out
'(sort' and '(lambda (a b)'.

Also, it allows to abstract aways unimportant style details such as
the use of #'... instead of (function ...), or the choice of
identifier. This is more obvious on bigger sources, but even for this
one-sexper, substituting #' for function (two occurences) only reduces
the compressed version by 4 bytes, which shows that this is not a
significant difference in expressiveness.

--
__Pascal Bourguignon__ http://www.informatimago.com/

"You can tell the Lisp programmers. They have pockets full of punch
cards with close parentheses on them." --> http://tinyurl.com/8ubpf

Gareth McCaughan

unread,

Dec 23, 2005, 7:33:45 PM12/23/05

to

Kaz Kylheku wrote:

> I only know that your Perl program performs string downcasing because I
> have taken your your word for it that it is equivalent to the Lisp
> one!!!
>
> I'm guessing that it's done by:
>
> s/\s+\z//;
>
> I would never guess what this does.

Actually, it trims trailing whitespace from the
magic "current line" variable $_. His Perl code
doesn't actually downcase the strings, so far as
I can see.

> Thtt <=> operator inspires this idea: define a macro operator in Lisp
> which evaluates two expressions, and then based on the result of
> comparing their values, evaluates one of three clauses:

One problem with that is that the order in which
the three clauses occur isn't obvious. I suppose
the name gives it away, but I'd be in constant
danger of reading it as if it were LESS-EQUAL-GREATER-CASE
instead of LESS-GREATER-EQUAL-CASE. So here's a
generalization. I don't like the name I've given it;
perhaps someone can suggest a better one.

(defmacro operator-case ((&rest args) &body clauses)
(let ((arg-syms (loop for arg in args collect (gensym))))
`(let ,(loop for arg in args for sym in arg-syms
collect `(,sym ,arg))
(cond
,@(loop for (operator . consequents) in clauses
collect (if (eql operator 't)
`(t ,@consequents)
`((,operator ,@arg-syms) ,@consequents)))))))

(defun longest-first (a b)
(operator-case ((length a) (length b))
(< nil)
(> t)
(t (loop

for aa in for bb in b do
when (string-lessp bb aa)
return nil

finally return t))))

The improvement in conciseness over just using COND is
modest.

I'd also consider a LEXICOGRAPHIC-COMPARE macro;
and I also remark that the Lisp code seems to be
broken in a couple of ways:

(loop for group in anagrams do
(setf group (sort group #'string-lessp)))

should be something more like

;; handy tool for MAPCAR...
(defun infinite-list (&rest items)
(setf (rest (last items)) items))

(setf anagrams
(mapcar #'sort anagrams (infinite-list #'string-lessp)))

because otherwise the results of SORT are being thrown away,
which is liable to be wrong; and I strongly suspect that
LONGEST-FIRST isn't actually doing what is intended. (But
maybe it is, for some subtle reason to do with the sortedness
of the groups.)

I'd propose (not maximally efficiently, but it'll do)

(defun compose (f g)
(lambda (&rest args)
(funcall f (apply g args))))

(defun longest-first (groups)
(setf groups (remove 1 groups :key #'length))
(setf groups (sort groups #'string< :key #'first))
(setf groups (stable-sort groups #'> :key (compose #'length #'first)))
(stable-sort groups #'> :key #'length))

where my LONGEST-FIRST does the sorting, not just the
comparison of one pair, and considers (1) group size,
(2) length of word, and (3) alphabetical order by first
word in group. It expects to be passed a list of *all*
groups, including the length-1 ones, in case you happen
to have a HASH-VALUES function already written.

--
Gareth McCaughan
.sig under construc

Marcin 'Qrczak' Kowalczyk

unread,

Dec 24, 2005, 8:58:27 AM12/24/05

to

Pascal Bourguignon <sp...@mouse-potato.com> writes:

>> This is unfair. When reading and writing code it's the raw size which
>> matters, not some entropy amount. Anyway, gzipped Kogut has 269 bytes.
>
> It is not unfair, because you don't read bits or bytes. You read
> words, or even patterns, such as (sort ... (lambda (a b) ...)) where
> you only read <sort> and the two '...'. So it's fair to compress out
> '(sort' and '(lambda (a b)'.

There is a difference whether a given idea is abstracted out,
implemented once and the abstraction is called multiple times,
or whether it's copied & pasted in each place it is used.

Looking at compressed size penalizes building abstractions,
by measuring the overhead of creating an abstraction but not
measuring the amount of copy & paste.

I agree that the length of identifiers is less important than
code structure. I think that token count is quite close to the
subjective measure of code size for a human.

It's not always clear how to delimit tokens though. In the
object.field syntax of various languages, is .field one token
or two? Should we use the definition used by the language syntax,
i.e. the answer depends on whether it's allowed to insert
whitespace there? It would give different results to seemingly
same constructs in different languages. It's especially hard
to count tokens in Perl.

Pascal Bourguignon

unread,

Dec 24, 2005, 2:37:14 PM12/24/05

to

Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> writes:

I'm not saying that the gziped size is an absolute scale of code
complexity. I'm saying that it's a better measure of code complexity
than LoC.

--
__Pascal Bourguignon__ http://www.informatimago.com/

Litter box not here.
You must have moved it again.
I'll poop in the sink.

Rob Warnock

unread,

Dec 25, 2005, 12:00:09 AM12/25/05

to

Pascal Bourguignon <sp...@mouse-potato.com> wrote:
+---------------

| Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> writes:
| > There is a difference whether a given idea is abstracted out,
| > implemented once and the abstraction is called multiple times,
| > or whether it's copied & pasted in each place it is used.
| >
| > Looking at compressed size penalizes building abstractions,
| > by measuring the overhead of creating an abstraction but not
| > measuring the amount of copy & paste.
| >
| > I agree that the length of identifiers is less important than
| > code structure. I think that token count is quite close to the
| > subjective measure of code size for a human.

...

| I'm not saying that the gziped size is an absolute scale of code
| complexity. I'm saying that it's a better measure of code complexity
| than LoC.

+---------------

I somewhat agree. While gzip'd size isn't perfect, it's certainly a
*lot* closer to the Kolmogorov/Chaitin notion of complexity than LOC.

On the other hand, I also somewhat agree with Kowalczyk, in that
informational complexity per se is a lot less important for *humans*
than code structure, and in particular, the length of each separately-
comprehensible "chunk" [in the sense of Miller's "Magical Number Seven"]
of code. Given the *extremely* small size of humans' "primary caches"
[so to speak], keeping the *number* of identifiers per chunk small
is definitely more important than the length of identifiers per se.
But overly-long identifiers whose names themselves don't encode
significant semantic information can be just as bad as too many
identifiers. Consider what happens to the readability of code
which is obfuscated by mapping all programmer-defined identifiers
into a dense set of random names such as "Dj5rBKbtKzIijNeg" or
"c2sRd5ACxmOTVrh4". Or worse, have you ever seen the output of
the classic C obfuscator which uses *only* identifiers starting
with #\I, #\i, #\O, #\o, and whose remaining characters are in
the set "IiOo10"? Not fun.

-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

krung...@yahoo.com

unread,

Dec 25, 2005, 8:22:58 PM12/25/05

to

> The sort comparator is syntactically awkward and bulky, but in the other
> areas, the Perl code seems more concise.

The difference is that the lisp code is comprehensible
at a casual glance, but the perl code isn't.

krung

eiz

unread,

Dec 26, 2005, 3:10:43 AM12/26/05

to

On Fri, 23 Dec 2005 19:02:01 +0000, Kenny Tilton wrote:

> Well then watch out for sound bites like "conciseness is power". To a
> point yes. Beyond that point lies obscurity. The K language (an APL-like
> tool) really does make conciseness the Prime Directive. You should see
> their one-page spreadsheet application. It is a howler, but they are
> serious.

Speaking of K:

anagram:{x g@&1<#:'g:={x@<x}'x}

Kaz Kylheku

unread,

Dec 26, 2005, 2:45:59 PM12/26/05

to

Rob Warnock wrote:
> Pascal Bourguignon <sp...@mouse-potato.com> wrote:
> +---------------
> | Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> writes:
> | > There is a difference whether a given idea is abstracted out,
> | > implemented once and the abstraction is called multiple times,
> | > or whether it's copied & pasted in each place it is used.
> | >
> | > Looking at compressed size penalizes building abstractions,
> | > by measuring the overhead of creating an abstraction but not
> | > measuring the amount of copy & paste.
> | >
> | > I agree that the length of identifiers is less important than
> | > code structure. I think that token count is quite close to the
> | > subjective measure of code size for a human.
> ...
> | I'm not saying that the gziped size is an absolute scale of code
> | complexity. I'm saying that it's a better measure of code complexity
> | than LoC.
> +---------------
>
> I somewhat agree. While gzip'd size isn't perfect, it's certainly a
> *lot* closer to the Kolmogorov/Chaitin notion of complexity than LOC.

I think that in some sense, it's quite fair to measure the overhead of
creating an abstraction. It can be a waste of effort to create lengthy
list of abstractions which are then used to make a small, concise
program. That program may be better than a somewhat more complex one in
another programming language, but suppose that other one isn't
accompanied by a list of abstractions that make it work?

This is why these comparisons are pointless for small "textbook"
programs. Because when you write large programs, you develop
abstractions that you reuse in many places. The value of these
abstractions grows with their use, which is not quite apparent from an
example in which they are used just once. (Or at least, seeing their
potential applications from a small example requires imagination,
insight and experience!)

The silly little contest only takes into account the small programs,
while ignoring the ramifications for continued development of those
programs into a larger suite.

It's like comparing two mathematical functions, to see which is larger,
but not looking beyond the range 0 <= x <= 1.

If you look at (* 2 (sqrt x)) and (* x x) over the range [0, 1], it
looks like (sqrt x) is bigger. It starts out steeper and reaches double
the value at x = 1.

It's odd that computer scientists can understand this; they know that
using a more complex algorithm that has a greater cost for a small
input size, can drastically reduce the run time for larger data sizes,
because the function that represents its running time doesn't grow as
fast as that of the more naive, simpler and easier-to-implement
algorithm.

Yet they forget all about this principle when it comes to programming
languages. The small, slick program that does the least with the fewest
utterances becomes some kind of holy grail.

We can make an almost exact parallel between these naivete of these
statements:

Look how small a program I can write in this language to solve this
problem.
Imagine the implications for large software development!

Look how fast my new algorithm can arrange an array of ten elements
in order!
Imagine the implications for huge arrays!

Marcin 'Qrczak' Kowalczyk

unread,

Dec 26, 2005, 5:20:55 PM12/26/05

to

"Kaz Kylheku" <kkyl...@gmail.com> writes:

> I think that in some sense, it's quite fair to measure the overhead
> of creating an abstraction. It can be a waste of effort to create
> lengthy list of abstractions which are then used to make a small,
> concise program. That program may be better than a somewhat more
> complex one in another programming language, but suppose that other
> one isn't accompanied by a list of abstractions that make it work?

Indeed. And I think that the raw size is a better measure of the
overhead than the compressed size. The cost of understanding code,
of finding bugs, of modifying the code etc. is better reflected by
the raw size.

It is indeed slightly biased towards the information density, i.e.
among two very similar raw sizes the smaller compressed size wins
(the human recognizes repeating patterns and doesn't have to analyse
each of it separately), but among the same compressed size it's quite
valuable to have a smaller raw size.

Of course size is not the only aspect of readability. Other aspects
are much harder to specify formally though.

> It's odd that computer scientists can understand this; they know that
> using a more complex algorithm that has a greater cost for a small
> input size, can drastically reduce the run time for larger data sizes,

Programming languages have the same asymptotical complexity of program
size wrt. problem size (in any reasonable measure) until we reach the
point where a human is not capable of designing a good program at all
and solutions become more and more suboptimal, just to make something
working. For high level languages that point is quite far away.

In theory capabilities of programming languages differ only in
constant factors (even though the differences can be quite high),
and the practice supports the theory until the point mentioned above,
where the high complexity of the problem for a non-brillant team of
programmers yields an unmaintainable mess and further development is
extremely hard.

I've seen projects where the number of bugs reported grows faster than
the number of bugs fixed, or where programmers themselves have said
that they wish it to be rewritten from scratch, or where I judge from
messy code they have probably reached that point: Evolution, Netscape 4,
Microsoft Windows, OpenOffice.

High level languages have an asymptotic advantage only after reaching
that point. For smaller projects, i.e. for the majority of them, they
have only a constant factor advantage.

Greg Bacon

unread,

Dec 28, 2005, 6:15:57 PM12/28/05

to

In article <PqYqf.428840$084.135407@attbi_s22>,
Brian Downing <see-si...@lavos.net> wrote:

: [...]
: Actually, your code didn't even work for me. What implementation did
: you try it on?

$ clisp --version
GNU CLISP 2.35 (2005-08-29) [...]

Was my apparent success an accident of the implementation?

Thanks for your suggestions.

Greg
--
He who proclaims the godliness of the State and the infallibility of
its priests, the bureaucrats, is considered as an impartial student of
the social sciences. All those raising objections are branded as biased
and narrow-minded. -- Ludwig von Mises, Planned Chaos

Greg Bacon

unread,

Dec 28, 2005, 6:27:54 PM12/28/05

to

In article <JQXqf.15067$Ed....@news-wrt-01.rdc-nyc.rr.com>,
Kenny Tilton <NOktil...@nyc.rr.com> wrote:

: Greg Bacon wrote:
:
: > Please understand that I don't mean this as yetanotherflame.

:
: Well then watch out for sound bites like "conciseness is power".

Sigh. It's really not how I meant it, and I explicitly rejected
the hackle-raising interpretation.

I had Paul Graham's "Succinctness is Power" in mind. As I wrote,
I'm sure my Lisp code was way too verbose. I'm sure there are
many more elegant expressions of the code.

I really was only looking for suggestions on how to improve my Lisp
code.

Greg
--
Once the WebCrawler has visited your essay about the naked incomprehension
greeting a women's group's demand for censorship of pornographic newsgroups,
the vast majority of its visitors will have been searching for "porn,"
"naked," and "women." -- Jutta Degener

Mark Tarver

unread,

Dec 30, 2005, 4:35:04 PM12/30/05

to

Greg,

I'm not going to write an alternative program, but
the first thing I would bear in mind here is a
dictum of Sussman's

"If you're trying to solve a problem then the worst
thing you can do is to solve that problem. What
you need to do is to find the class into which the
problem falls and to solve for the class."

Your problem belongs to the general class of problems
which use an equivalence relation to partition a set.
So the first thing I would do is define a 2-place function
called 'partition' which did just that. This is going to
be a 2-place higher-order function. Here's a version.

(DEFUN partition (R Set)
(IF (NULL Set)
NIL
(LET ((Equivalence
(REMOVE-IF-NOT #'(LAMBDA (Y) (FUNCALL R (CAR Set) Y))

Set)))
(CONS Equivalence
(partition R (SET-DIFFERENCE Set Equivalence
:TEST 'EQUAL))))))

Examples:

(partition 'EQUALP '("Me" "You" "me" "Him" "him"))
(("Me" "me") ("You") ("Him" "him"))

(DEFUN samelen? (X Y) (EQUAL (LENGTH X) (LENGTH Y)))

(partition 'samelen? '("Me" "You" "me" "Him" "him"))
(("Me" "me") ("You" "Him" "him"))

Thats a hint; I'm sure you can carry on from there.

Mark

Mark Tarver

unread,

Dec 30, 2005, 8:17:37 PM12/30/05

to

Greg,

> I really was only looking for suggestions on how to improve my Lisp
> code.

Having nothing much to do before bedtime, I finished off the problem.
If I have understood your spec rightly; this is a solution of 18 lines
(in my original, minus comments, I've had to spread it out a lot for
usenet). I hope it provides some illumination. A quick test below.

; take in a list of words as symbols
(DEFUN anagrams (Words)
(MAPCAR 'sort-alphabetically
(partition 'anagram? (sort-by-length Words))))

; higher-order partition function

(DEFUN partition (R Set)
(IF (NULL Set) NIL
(LET ((Equivalence
(REMOVE-IF-NOT
#'(LAMBDA (Y) (FUNCALL R (CAR Set) Y)) Set)))
(CONS Equivalence
(partition R
(SET-DIFFERENCE Set Equivalence
:TEST 'EQUAL))))))

; sort your words by their length,
; coercing them to lists of characters while doing
(DEFUN sort-by-length (Words)
(SORT
(MAPCAR #'(LAMBDA (Word) (COERCE (FORMAT NIL "~A" Word) 'LIST))
Words)
#'(LAMBDA (Word1 Word2) (> (LENGTH Word1) (LENGTH Word2)))))

(DEFUN anagram? (Word1 Word2)
(COND ((EQUAL Word1 Word2))
((NOT (EQUAL (LENGTH Word1) (LENGTH Word2))) NIL)
(T (anagram? (REMOVE (CAR Word1) (CDR Word1))
(REMOVE (CAR Word1) Word2)))))

; coerce a character list to a string and use STRINGLESSP
; to order them
; and READ-FROM-STRING to get back to symbols.
(DEFUN sort-alphabetically (Words)
(MAPCAR 'READ-FROM-STRING
(SORT (MAPCAR #'(LAMBDA (Word) (COERCE Word 'STRING)) Words)
'STRING-LESSP )))

Quick Test:

(anagrams '(on no tab bat meat team batter berate beater ))
((batter) (beater berate) (meat team) (bat tab) (no on))

hope this helps

Mark

Greg Bacon

unread,

Jan 4, 2006, 3:41:49 PM1/4/06

to

Thanks for the suggestions -- revised code below:

(defun anagram-list (hash)

(loop for words being the hash-values of hash

when (>= (length words) 2) collect words))

(defun find-anagrams (path)
(let ((anagrams (make-hash-table :test #'equal)))
(with-open-file (s path)
(loop for word = (read-line s nil)
while word do
(push word (gethash (normal-form word) anagrams))))
(anagram-list anagrams)))

(defun normal-form (word)
(sort (nstring-downcase (copy-seq word)) #'char-lessp))

(defun longest-group-first-then-pairwise-alpha (a b)

(let ((lena (length a))
(lenb (length b)))

(or (> lena lenb)
(and (= lena lenb)

(loop for aa in a

for bb in b do

(when (string-lessp aa bb)
(return t))
finally (return nil))))))

(defun group-ascending (group)
(sort group #'string-lessp))

(defun sort-longest-group-then-pairwise-alpha (anagrams)
(sort (mapcar #'group-ascending anagrams)
#'longest-group-first-then-pairwise-alpha))

(let ((anagrams (find-anagrams #P"words721.txt")))

(loop for group in (sort-longest-group-then-pairwise-alpha anagrams) do
(format t "~{~a~^ ~}~%" group)))

How can I improve it?

Greg
--
If you are making the point that C++ is nasty, and you want people to
be disgusted by it, one of the most effective things you can do is to
put some turds on the page.
-- Mark Jason Dominus

Tayssir John Gabbour

unread,

Jan 6, 2006, 3:34:25 AM1/6/06

to

Mark Tarver wrote:
> Greg,
>
> I'm not going to write an alternative program, but
> the first thing I would bear in mind here is a
> dictum of Sussman's
>
> "If you're trying to solve a problem then the worst
> thing you can do is to solve that problem. What
> you need to do is to find the class into which the
> problem falls and to solve for the class."
>
> Your problem belongs to the general class of problems
> which use an equivalence relation to partition a set.
> So the first thing I would do is define a 2-place function
> called 'partition' which did just that. This is going to
> be a 2-place higher-order function. Here's a version.
>
> (DEFUN partition (R Set)
> (IF (NULL Set)
> NIL
> (LET ((Equivalence
> (REMOVE-IF-NOT #'(LAMBDA (Y) (FUNCALL R (CAR Set) Y))
> Set)))
> (CONS Equivalence
> (partition R (SET-DIFFERENCE Set Equivalence
> :TEST 'EQUAL))))))

Good point. I just have a question on optimization.

Ok, so you keep choosing an element of the set and pulling out elements
which are equivalent. Assuming this is O(n^2) in the worst case...

When you wish to optimize this (say to use hashtables), do you just
rewrite PARTITION, making the interface more complex like this?

(defun partition-normalize (normalize-fn set &key (hash-test 'eql))
(loop with hash = (make-hash-table :test hash-test)
for elem in set
do (push elem (gethash (funcall normalize-fn elem) hash))
finally (return (hash-values hash))))

(defun hash-values (hash)
(loop for v being the hash-values of hash collect v))

Or is there something that (I guess) functional programmers tend to do
which avoids this messiness? Maybe something using lazy evaluation?

Thanks,
Tayssir

Tayssir John Gabbour

unread,

Jan 6, 2006, 3:39:38 AM1/6/06

to

Tayssir John Gabbour wrote:

> Mark Tarver wrote:
> > (DEFUN partition (R Set)
> > (IF (NULL Set)
> > NIL
> > (LET ((Equivalence
> > (REMOVE-IF-NOT #'(LAMBDA (Y) (FUNCALL R (CAR Set) Y))
> > Set)))
> > (CONS Equivalence
> > (partition R (SET-DIFFERENCE Set Equivalence
> > :TEST 'EQUAL))))))
>
> Good point. I just have a question on optimization.
>
> Ok, so you keep choosing an element of the set and pulling out elements
> which are equivalent. Assuming this is O(n^2) in the worst case...
>
> When you wish to optimize this (say to use hashtables), do you just
> rewrite PARTITION, making the interface more complex like this?
>
> (defun partition-normalize (normalize-fn set &key (hash-test 'eql))
> (loop with hash = (make-hash-table :test hash-test)
> for elem in set
> do (push elem (gethash (funcall normalize-fn elem) hash))
> finally (return (hash-values hash))))
>
> (defun hash-values (hash)
> (loop for v being the hash-values of hash collect v))
>
> Or is there something that (I guess) functional programmers tend to do
> which avoids this messiness? Maybe something using lazy evaluation?

Incidentally, more sourcecode would be...

(defun partition-normalize (normalize-fn set &key (normalize-test
'eql))
(loop with hash = (make-hash-table :test normalize-test)

for elem in set
do (push elem (gethash (funcall normalize-fn elem) hash))
finally (return (hash-values hash))))

(defun hash-values (hash)
(loop for v being the hash-values of hash collect v))

(defun normalize-anagram (word)
(sort (string-downcase word) #'string<))

(partition-normalize #'normalize-anagram '("foo" "oof" "bar" "bra")
:normalize-test 'equal)

Tayssir

WJ

unread,

Feb 27, 2011, 6:36:43 AM2/27/11

to

Pascal Bourguignon wrote:

Ruby:

# Get words from the file on the command-line.
# By default, "split" splits on whitespace.
words = gets(nil).split.map{|w| w.upcase}

# Group by the sorted letters.
table = words.group_by{|w| w.split("").sort.join}

# Extract the groups of words, and sort each group.
groups = table.values.map{|group| group.sort}

# Pretty printing.
require "pp"

pp groups.sort_by{|g| -g.size}[0,20]

[["CARET", "CARTE", "CATER", "CERTA", "CRATE", "CREAT", "REACT",
"TRACE"],
["ERSTE", "ESTER", "RESET", "RESTE", "STEER", "TERES", "TERSE",
"TREES"],
["APRES", "ASPER", "PARES", "PARSE", "PERAS", "SPARE", "SPEAR"],
["LEAST", "SLATE", "STAEL", "STALE", "STEAL", "TALES", "TESLA"],
["ACRES", "CARES", "CERAS", "SACRE", "SCARE", "SERAC"],
["AIDS", "DAIS", "DISA", "IDAS", "SAID", "SIDA"],
["ARIL", "LAIR", "LARI", "LIAR", "LIRA", "RAIL"],
["REINS", "RESIN", "RINSE", "RISEN", "SERIN", "SIREN"],
["ADEL", "DALE", "DEAL", "LADE", "LEAD", "LEDA"],
["AMINO", "ANIMO", "NAOMI", "NOMIA", "OMANI", "OMNIA"],
["CAPERS", "CASPER", "ESCARP", "PARSEC", "SCRAPE"],
["NEPAL", "PANEL", "PENAL", "PLANE", "PLENA"],
["ARTS", "RAST", "RATS", "STAR", "TSAR"],
["KEATS", "SKATE", "STAKE", "STEAK", "TAKES"],
["CORSET", "CORTES", "ESCORT", "SCOTER", "SECTOR"],
["CANTER", "CRETAN", "NECTAR", "RECANT", "TRANCE"],
["NAPE", "NEAP", "NEPA", "PANE", "PENA"],
["LUSTER", "RESULT", "RUSTLE", "SUTLER", "ULSTER"],
["ABEST", "BASTE", "BEAST", "BEATS", "TABES"],
["ENETS", "SEENT", "SENTE", "TEENS", "TENSE"]]

jos...@lisp.de

unread,

Feb 27, 2011, 7:52:40 AM2/27/11

to

On 27 Feb., 12:36, "WJ" <w_a_x_...@yahoo.com> wrote:
> Pascal Bourguignon wrote:
> > gba...@hiwaay.net (Greg Bacon) writes:
>
> > > As an exercise, I wrote code to find anagrams in a one-word-per-line

> > > dictionary. Seehttp://blogs.pragprog.com/cgi-bin/pragdave.cgi/

Lisp:

(defun group (words &aux result)
(flet ((norm (word)
(intern (sort (copy-seq (symbol-name word)) #'char-lessp)
(symbol-package word))))
(dolist (w words)
(let* ((nw (norm w))
(ws (assoc nw result)))
(if ws
(setf (cdr ws) (merge 'list (list w) (cdr ws) #'string-
lessp))
(push (list nw w) result))))
(mapcar #'cdr result)))

> (group '(CARTE ERSTE ESTER CARET RESET CATER))
((ERSTE ESTER RESET) (CARET CARTE CATER))

Drew Crampsie

unread,

Feb 27, 2011, 4:30:28 PM2/27/11

to

"jos...@lisp.de" <jos...@lisp.de> writes:

Challenge 1 : Only traverse the input list once, use a hash table for
constant time lookups, avoid interning.

(defun group (list &aux
(table (make-hash-table :test #'equal))
result)
(dolist (item list result)
(let* ((key (sort (copy-seq (string item)) #'char<))
(value (gethash key table)))
(if (null value)
(let ((new-value (list item)))
(push new-value result)
(setf (gethash key table) new-value))
(setf (cdr value) (cons item (cdr value)))))))

Challenge 2 : re-use the conses from the input list.

(defun ngroup (list &aux
(table (make-hash-table :test #'equal))
result)
(loop
(let* ((next (cdr list))
(key (sort (copy-seq (string (car list))) #'char<))
(value (gethash key table)))
(setf (cdr list) (cdr value))
(if (null value)
(setf (gethash key table) list
result (push list result))
(setf (cdr value) list))
(or (setf list next) (return result)))))

Challenge 3 : Reuse the conses from the input list, modify the
input-list in place, words appear in output in the same order they
were given.

(defun in-place-group-preserving-order (list &aux
(tails (make-hash-table :test #'equal))
(result list))
(flet ((key (item) (sort (copy-seq (string item)) #'char<)))
(let* ((first (list (car list)))
(rest (cdr list)))
(setf (car list) first
(cdr list) nil
(gethash (key (car list)) tails) first)
(loop
(let* ((key (key (car rest)))
(next (cdr rest))
(tail (gethash key tails)))
(setf (cdr rest) nil
(gethash key tails) rest)
(if (null tail)
(setf (cdr list) (list rest)
list (cdr rest))
(setf (cdr tail) rest))
(or (setf rest next) (return result)))))))