converting a string to a list

36 views
Skip to first unread message

ccc31807

unread,
Jul 14, 2009, 9:17:35 AM7/14/09
to
In CL, if you have a list which you can then turn into a string like
this:
> (setf my-list '(this is a list)) ;make a list
> (setf my-string (write-to-string my-list)) ;convert the list to a string

In Perl, if I wanted to go the other way, I'd write:
> my $str = 'This is a string'; #make a string
> my $list = split $str; #convert the string to a list

How do you convert a string to a list in CL? Is there a CL equivalent
to Perl's split() function, or Java's StringTokenizer class?

Thanks, CC.

Teemu Likonen

unread,
Jul 14, 2009, 9:44:45 AM7/14/09
to
On 2009-07-14 06:17 (-0700), ccc wrote:

>> (setf my-list '(this is a list)) ;make a list

>> my $str = 'This is a string'; #make a string


>> my $list = split $str; #convert the string to a list
>
> How do you convert a string to a list in CL? Is there a CL equivalent
> to Perl's split() function, or Java's StringTokenizer class?

I'm sure you'll get tons of smarter answers than this one but here's the
two simple ways that I know. In CLISP there is REGEXP:REGEXP-SPLIT
function:

(regexp:regexp-split " \\+" "This is a string")
=> ("This" "is" "a" "string")

And in cl-ppcre package there is CL-PPCRE:SPLIT:

(asdf:oos 'asdf:load-op 'cl-ppcre)

(cl-ppcre:split " +" "This is a string")
=> ("This" "is" "a" "string")

But lists' items are strings too. In your original "my-list" variable
list's items are symbols. How to convert them I don't quite know at the
moment. :-)

Zach Beane

unread,
Jul 14, 2009, 9:57:15 AM7/14/09
to
ccc31807 <cart...@gmail.com> writes:

You can use READ-FROM-STRING to read from the string you created with
WRITE-TO-STRING.

I use CL-PPCRE:SPLIT to split things on whitespace:

http://weitz.de/cl-ppcre/#split

Zach

Pillsy

unread,
Jul 14, 2009, 10:06:09 AM7/14/09
to
On Jul 14, 9:44 am, Teemu Likonen <tliko...@iki.fi> wrote:
> On 2009-07-14 06:17 (-0700), ccc wrote:
[...]

> And in cl-ppcre package there is CL-PPCRE:SPLIT:

>     (asdf:oos 'asdf:load-op 'cl-ppcre)

>     (cl-ppcre:split " +" "This is a string")
>     => ("This" "is" "a" "string")

> But lists' items are strings too. In your original "my-list" variable
> list's items are symbols. How to convert them I don't quite know at the
> moment. :-)

You can use INTERN to create symbols from strings.

* (mapcar #'intern '("This" "was" "a" "string"))
(|This| |was| |a| |string|)

The #\| indicate that the strings are mixed- or lower-case; INTERN
preserves case while the Lisp reader upcases by default.

Whether or not you really do want to work with a list of symbols
instead of a list of strings is another matter entirely. For most
applications, you're probably better off working with strings
directly.

Cheers,
Pillsy

Joshua Taylor

unread,
Jul 14, 2009, 10:12:44 AM7/14/09
to

It's not part of the standard, but SPLIT-SEQUENCE [1] might be what
you're looking for. It splits a sequence on some elements which are the
same as a given delimiter, or which satisfy some given predicate (in the
case of SPLIT-SEQUENCE-IF, and SPLIT-SEQUENCE-IF-NOT). I think this is
similar to Java's StringTokenizer (but that's just from a very quick
glance at the StringTokenizer doc).

If that's not enough, e.g., if you want a delimiter to possibly be more
than that just a single element in the original sequence, you might look
at SPLIT [2] in Edi Weitz's CL-PPCRE [3]. It is more like Perl's split
in that it takes a regular expression, and gives the subsequences
between subsequences matching the regexp. From its documentation:
"This function also tries hard to be Perl-compatible - thus the somewhat
peculiar behaviour."

And, of course, implementations might provide similar functionality:

Lispworks:
* LISPWORKS:SPLIT-SEQUENCE
(exported, but undocumented?)

Allegro:
* SPLIT-RE
http://www.franz.com/support/documentation/7.0/doc/operators/excl/split-re.htm

* SPLIT-REGEXP
http://www.franz.com/support/documentation/7.0/doc/operators/excl/split-regexp.htm

//JT

[1] http://www.cliki.net/SPLIT-SEQUENCE
[2] http://weitz.de/cl-ppcre/#split
[3] http://weitz.de/cl-ppcre/

Vassil Nikolov

unread,
Jul 14, 2009, 10:18:44 AM7/14/09
to

On Tue, 14 Jul 2009 06:17:35 -0700 (PDT), ccc31807 <cart...@gmail.com> said:
> ...

> How do you convert a string to a list in CL? Is there a CL equivalent
> to Perl's split() function, or Java's StringTokenizer class?

As a partial answer, if you are quite sure the string contains just
readable S-expressions (such as symbols), you can do something along
the lines of this example:

((lambda (s)
(with-input-from-string (s s)
(loop for x = (read s nil s)
while (not (eql x s))
collect x)))
"foo bar baz")
=> (FOO BAR BAZ)

---Vassil.


--
"Even when the muse is posting on Usenet, Alexander Sergeevich?"

Giorgos Keramidas

unread,
Jul 14, 2009, 12:28:26 PM7/14/09
to
On Tue, 14 Jul 2009 16:44:45 +0300, Teemu Likonen <tlik...@iki.fi> wrote:
> On 2009-07-14 06:17 (-0700), ccc wrote:
>>> (setf my-list '(this is a list)) ;make a list

>> my $str = 'This is a string'; #make a string
>> my $list = split $str; #convert the string to a list

>> How do you convert a string to a list in CL? Is there a CL equivalent
>> to Perl's split() function, or Java's StringTokenizer class?
>

> (cl-ppcre:split " +" "This is a string")
> => ("This" "is" "a" "string")
>
> But lists' items are strings too. In your original "my-list" variable
> list's items are symbols. How to convert them I don't quite know at
> the moment. :-)

You can upcase and intern them:

* (mapcar (lambda (name)
(intern (string-upcase name)))
(list "foo" "bar"))
(FOO BAR)

I'll risk stating the obvious: It may still be useful to work with
strings if all you want to do is text processing.

Symbols are nice if you are planning to bind *values* to them, i.e.:

* (defparameter foo nil)
FOO

* (defparameter bar nil)
BAR

* (list foo bar)
(NIL NIL)

* (mapcar #'symbol-value
(mapcar (lambda (name)
(intern (string-upcase name)))
(list "foo" "bar")))
(NIL NIL)

* (setf foo 3)
3

* (setf bar 4)
4

* (mapcar #'symbol-value
(mapcar (lambda (name)
(intern (string-upcase name)))
(list "foo" "bar")))
(3 4)

but working directly with strings is probably going to be a lot more
convenient if you are merely interested in parsing a line, splitting it
in a few fields and "do something" to them.

fortunatus

unread,
Jul 14, 2009, 4:56:58 PM7/14/09
to


Here's the more straight up back and forth answer, I mean going
backwards straight from WRITE-TO-STRING's output. I don't know if you
meant the question within the scope of Lisp data representation, which
is my answer here, or to be parsing from strings formatted as they are
in your PERL example, which is what everyone else answered.


[16]> (write-to-string '(a b c))
"(A B C)"
[17]>

[18]> (read-from-string "(a b c)")
(A B C) ;
7
[19]>

ccc31807

unread,
Jul 14, 2009, 5:16:23 PM7/14/09
to

Thanks for your answer. Here's the deal.

I working through Winston and Horn, 3rd edition. Doing Problem 6-1, I
am using strings rather than lists to represent data, as strings are
much more natural for me. (By day, I'm a database manager and data
munger and I work with strings a lot.)

Problem 6-1 requires a search function for titles of books by literal
text. With the title as a string, I can do this using (search), but
the problem is that ALL letters match, so for "Moby Dick" 'Moby'
matches, as well as 'oby' as well as 'by' as well as 'y'. I looked at
CL-PPCRE and am satisfied that it will do what I want, but I was
curious whether Lisp could do the same thing as Perl in this context.

To be honest, I'm a Lisp newbie and really don't know enough to ask
intelligent questions. If you can respond to this message, please do
so. If not, don't worry about it.

Thanks, CC.

Joshua Taylor

unread,
Jul 14, 2009, 6:01:42 PM7/14/09
to
ccc31807 wrote:
> I working through Winston and Horn, 3rd edition. Doing Problem 6-1, I
> am using strings rather than lists to represent data, as strings are
> much more natural for me. (By day, I'm a database manager and data
> munger and I work with strings a lot.)
>
> Problem 6-1 requires a search function for titles of books by literal
> text. With the title as a string, I can do this using (search), but
> the problem is that ALL letters match, so for "Moby Dick" 'Moby'
> matches, as well as 'oby' as well as 'by' as well as 'y'. I looked at
> CL-PPCRE and am satisfied that it will do what I want, but I was
> curious whether Lisp could do the same thing as Perl in this context.

Now, looking at that problem, the task is to find books whose titles
contain all of a given /set/ of words, i.e., whose titles are a
/superset/ of the query words. The reason that the task is somewhat
complicated by using strings is that now the title is a sequences of the
/characters/ that make up the title rather than a sequence of /words/ in
the title. But even so, the problem asks for a function that finds all
the books whose title includes all the elements in the query, but not
necessarily in the same /order/ as in the query. Consider the examples
from the textbook:

* (find-book-by-title-words '(black orchid) books)
((TITLE (THE BLACK ORCHID))
(AUTHOR (REX STOUT))
(CLASSIFICATION (FICTION MYSTERY)))

* (find-book-by-title-words '(orchid black) books)
((TITLE (THE BLACK ORCHID))
(AUTHOR (REX STOUT))
(CLASSIFICATION (FICTION MYSTERY)))

The solution suggested by the text makes use of SUBSETP and is a one
liner using FIND with :KEY and :TEST arguments. However, SUBSETP is
defined on lists, not general sequences. As such, if you're going to do
the exercise using strings rather than lists of symbols, a better
correspondent would be to find all books whose titles contain all of a
given bag of characters. Thus, something like:

* (find-book-by-title-letters "ck" books)
((TITLE "the black orchid") ; #\c, #\k, are in "the black orchid"
(AUTHOR (REX STOUT))
(CLASSIFICATION (FICTION MYSTERY)))

* (find-book-by-title-letters "db" books)
((TITLE "the black orchid") ; so are #\d and #\b
(AUTHOR (REX STOUT))
(CLASSIFICATION (FICTION MYSTERY)))

Now SEARCH is no longer quite what you're looking for. How can you
approach this problem?

//JT

Rob Warnock

unread,
Jul 14, 2009, 10:25:51 PM7/14/09
to
ccc31807 <cart...@gmail.com> wrote:
+---------------

| How do you convert a string to a list in CL? Is there a CL equivalent
| to Perl's split() function, or Java's StringTokenizer class?
+---------------

Others have mentioned the SPLIT-SEQUENCE package & function
[almost identical to Perl's split()], but just so you know,
CL also has the ultimate fine-grained split builtin: ;-}

> (coerce "This is a string." 'list)

(#\T #\h #\i #\s #\ #\i #\s #\ #\a #\ #\s #\t #\r #\i #\n #\g #\.)
> (subseq * 8 16)

(#\a #\ #\s #\t #\r #\i #\n #\g)
> (coerce * 'string)

"a string"
>


-Rob

p.s. Of course, SUBSEQ works directly on strings, too:

> (subseq "This is a string." 8 16)

"a string"
>

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Vassil Nikolov

unread,
Jul 15, 2009, 1:58:58 AM7/15/09
to

On Tue, 14 Jul 2009 14:16:23 -0700 (PDT), ccc31807 <cart...@gmail.com> said:
> ...
> I working through Winston and Horn, 3rd edition. Doing Problem 6-1, I
> am using strings rather than lists to represent data, as strings are
> much more natural for me. (By day, I'm a database manager and data
> munger and I work with strings a lot.)

I believe it is important to note that Lisp's natural data are
S-expressions (basically, symbols, numbers, and lists of
S-expressions), just like Perl's natural data are strings, (Perl)
lists and (Perl) hash(-tabl)es. If you manage to get your data to
be S-expressions, you'll have a much easier time then manipulating
them in Lisp programs. For example, as already noted in this
thread, in Lisp it is much easier---indeed, less than trivial---to
do

((lambda (keywords title)
(subsetp keywords title))
'(express murder) '(murder on the orient express))
=> T

than

((lambda (keywords title)
...)
"express murder" "murder on the orient express")

(where filling in the elided part is left as an exercise for the
proverbial industrious reader...).

As another point, one rule of thumb about deciding between symbols
and strings is to choose symbols if a fast equality test will be
needed (so, for example, database fields whose string values are a
kind of ID may be best represented as symbols in a Lisp program).

Pascal J. Bourguignon

unread,
Jul 15, 2009, 2:10:50 AM7/15/09
to
Vassil Nikolov <vnik...@pobox.com> writes:

But choosing symbols is not even needed (or specially helpful).

((lambda (keywords title)
(subsetp keywords title :test (function string-equal))) ; case insensitive
'("express" "murder") '("murder" "on" "the" "orient" "express"))

Of course, if the OP explained what perl' split does, it would be
easier to help him...

--
__Pascal Bourguignon__

Vassil Nikolov

unread,
Jul 15, 2009, 8:07:07 AM7/15/09
to

On Wed, 15 Jul 2009 08:10:50 +0200, p...@informatimago.com (Pascal J. Bourguignon) said:

> Vassil Nikolov <vnik...@pobox.com> writes:
>> ...


>> ((lambda (keywords title)
>> (subsetp keywords title))
>> '(express murder) '(murder on the orient express))

>> ...

> But choosing symbols is not even needed (or specially helpful).

> ((lambda (keywords title)
> (subsetp keywords title :test (function string-equal))) ; case insensitive
> '("express" "murder") '("murder" "on" "the" "orient" "express"))

Even at a superficial glance, choosing symbols is helpful towards
being economical, by reducing clutter in the program (and improving
speed a little). As to whether it is needed, that could only be
determined, either way, in the context of a particular problem or
application.

ccc31807

unread,
Jul 15, 2009, 9:01:59 AM7/15/09
to
On Jul 14, 6:01 pm, Joshua Taylor <tay...@cs.rpi.edu> wrote:
> Now SEARCH is no longer quite what you're looking for.  How can you
> approach this problem?

Ordinarily, I'd use a regular expression with word boundaries, or else
say that the observed behavior meets the specification and let it be.

In my job, I take files (some extremely large) and process them line
by line. The lines come in as strings, and my strategy is:
1. read in the line as a string
2. split the string (usually on whitespace, or comma, or pipe) into
some kind of list (separate scalare, an array, or hash, or
combination)
3. manipulate the individual data values
4. join the data values into a new string
5. write out the new line as a string

All this is very simple and easy to do in Perl, but it seems that Lisp
isn't designed to do this easily. This isn't a criticism, just an
observation.

Thanks for your pointers. I'll go back and look at Chapter 6 and
attempt the problem again.

CC

ccc31807

unread,
Jul 15, 2009, 9:06:50 AM7/15/09
to
On Jul 15, 1:58 am, Vassil Nikolov <vniko...@pobox.com> wrote:
>   I believe it is important to note that Lisp's natural data are
>   S-expressions (basically, symbols, numbers, and lists of
>   S-expressions), just like Perl's natural data are strings, (Perl)
>   lists and (Perl) hash(-tabl)es.  If you manage to get your data to
>   be S-expressions, you'll have a much easier time then manipulating
>   them in Lisp programs.

Okay, point taken. When in Rome ...

Still, in my job data tends to come in as strings as I have noted
above. I don't a philosophical objection to using s-expressions rather
than strings, but if my data consists of strings then I still have the
problem of converting it to lists. Shouldn't be hard to do.

Actually, now that I'm thinking about it, case is significant, and I
don't know how to convert a string to an s-expression and preserve
case.

CC

Luís Oliveira

unread,
Jul 15, 2009, 9:17:18 AM7/15/09
to
ccc31807 <cart...@gmail.com> writes:

> Actually, now that I'm thinking about it, case is significant, and I
> don't know how to convert a string to an s-expression and preserve
> case.

Here's an example.

CL-USER> (let ((*readtable* (copy-readtable)))
(setf (readtable-case *readtable*) :preserve)
(read-from-string "(foo bar baz)"))
(|foo| |bar| |baz|)
13

--
Luís Oliveira
http://student.dei.uc.pt/~lmoliv/

Pascal J. Bourguignon

unread,
Jul 15, 2009, 10:19:14 AM7/15/09
to
ccc31807 <cart...@gmail.com> writes:

Is that so hard:

#|5:|#(write-line
#|4:|#(mapconcat (function identity)
#|3:|#(manipulate-list-of-words
#|2:|#(split-sequence-if (function special-character-p)
#|1:|#(read-line)))
" "))
?

Then perhaps lisp is not made for you indeed...


PS: Not all of these functions are standard functions, but they're
either found in usual libraries, or heavily discussed here, or
trivial to implement.

--
__Pascal Bourguignon__

Pascal J. Bourguignon

unread,
Jul 15, 2009, 10:26:58 AM7/15/09
to
ccc31807 <cart...@gmail.com> writes:

> On Jul 15, 1:58�am, Vassil Nikolov <vniko...@pobox.com> wrote:
>> � I believe it is important to note that Lisp's natural data are
>> � S-expressions (basically, symbols, numbers, and lists of
>> � S-expressions), just like Perl's natural data are strings, (Perl)
>> � lists and (Perl) hash(-tabl)es. �If you manage to get your data to
>> � be S-expressions, you'll have a much easier time then manipulating
>> � them in Lisp programs.
>
> Okay, point taken. When in Rome ...
>
> Still, in my job data tends to come in as strings as I have noted
> above. I don't a philosophical objection to using s-expressions rather
> than strings, but if my data consists of strings then I still have the
> problem of converting it to lists. Shouldn't be hard to do.

Vassil spoke to say nothing. A string is a S-expr in lisp.


A S-expr, is a symbolic expression, is an atom, or a list of S-expr.
A string is an atom, therefore a string is a S-expr.

And list of strings are perfectly good S-expr too.


And if you like to process strings, there's nothing preventing you to
do. I'd have some reserves performance-wise, but otherwise, why not?


I'd prefer to write:

(defun month-name (month-number)
(check-type month-number (integer 1 12))
(aref #(nil "January" "February" "March"
"April" "May" "June"
"July" "August" "September"
"October" "November" "December") month-number))

than:

(defun month-name (month-number)
(check-type month-number (integer 1 12))
(string-trim " "
(subseq "January February March April May June July August SeptemberOctober November December "
(* (1- month-number) 9) (* month-number 9))))


but it's up to you, as long as you encapsulate things in nice
abstractions...


And don't tell us perl has no other data structure than strings.


> Actually, now that I'm thinking about it, case is significant, and I
> don't know how to convert a string to an s-expression and preserve
> case.

Well go on studing lisp. Eventually you'll know.

--
__Pascal Bourguignon__

Björn Lindberg

unread,
Jul 15, 2009, 11:38:42 AM7/15/09
to
ccc31807 <cart...@gmail.com> writes:

> On Jul 15, 1:58�am, Vassil Nikolov <vniko...@pobox.com> wrote:
>> � I believe it is important to note that Lisp's natural data are
>> � S-expressions (basically, symbols, numbers, and lists of
>> � S-expressions), just like Perl's natural data are strings, (Perl)
>> � lists and (Perl) hash(-tabl)es. �If you manage to get your data to
>> � be S-expressions, you'll have a much easier time then manipulating
>> � them in Lisp programs.
>
> Okay, point taken. When in Rome ...
>
> Still, in my job data tends to come in as strings as I have noted
> above. I don't a philosophical objection to using s-expressions rather
> than strings, but if my data consists of strings then I still have the
> problem of converting it to lists. Shouldn't be hard to do.

The internal representation of data in your program is governed by
what kinds of operations you need to perform on that data, not on the
format of the input data, nor of the output. The format of strings is
unstructured text. Thus, if you need to do any form of structured
manipulations of the data, you convert it to a suitable format
first. As an example, a compiler might take strings as inputs, and
produce binary machine code as output, but it does not do all
computations internal computations on those two data types.

This advice is language independant, in that strings are unstructured
in any language.

> Actually, now that I'm thinking about it, case is significant, and I
> don't know how to convert a string to an s-expression and preserve
> case.

Many times you can 'cheat', and use the Lisp reader. It is possible to
make it read case sensitively, and with reader macros make it read any
characters specially. If the input format is not at all Lisp though,
it may be simpler to do what you do in other languages, and read in
text character by character, or line by line.


Bj�rn Lindberg

ccc31807

unread,
Jul 15, 2009, 11:45:08 AM7/15/09
to
On Jul 15, 10:26 am, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

> And don't tell us perl has no other data structure than strings.

You don't understand. I didn't say that Perl has no data structures
other than strings. What I said was that my job entails working
primarily with strings.

I can export data from a database (e.g., as the return value of an SQL
query) in a number of forms, a text file, a CSV file, an XML file,
even as a pointer or a reference to a memory location. This data can
be read as binary or as ASCII, but as a practical matter, I treat it
as ASCII, whether I open it using something like vi or notepad, or
Excel, or import it into another database (such as Postgres, MySQL,
SQL Server, or even Access).

I don't know of a database that either exports or imports data as s-
expressions. If you know of one, please tell me.

In any case, I don't seek to pervert a language developed to deal with
problems of one sort to deal with problems of another sort. Languages
are simply tools that we use to solve problems. I'm not looking for a
replacement for a tool I already have to solve problems that I already
know how to solve, but for tools that I don't have to solve problems
that I don't know how to solve. In doing so, I'm working from the more
familiar to the less familiar.

CC

Pillsy

unread,
Jul 15, 2009, 12:15:40 PM7/15/09
to
On Jul 15, 11:45 am, ccc31807 <carte...@gmail.com> wrote:

> On Jul 15, 10:26 am, p...@informatimago.com (Pascal J. Bourguignon)
> wrote:

> > And don't tell us perl has no other data structure than strings.

> You don't understand. I didn't say that Perl has no data structures
> other than strings. What I said was that my job entails working
> primarily with strings.

I'm hesitant to speak for Pascal, but I suspect he's pointing out that
working with strings usually involves using other data structures in
concert with strings, be they associative arrays, lists, or something
else.

Also, while I think your hesitance to "pervert" Common Lisp too much
during the learning stages is a good one---it's all to easy to waste a
lot of time as a newbie trying to reshape it into something more
familiar---once you get the hang of it, its greatest strength is how
easy it is to pervert. With a little work, you can turn your Lisp
implementation into a very convenient tool for line-oriented string
mangling.

Cheers,
Pillsy
[...]

Zach Beane

unread,
Jul 15, 2009, 12:28:11 PM7/15/09
to
Pillsy <pill...@gmail.com> writes:

> Also, while I think your hesitance to "pervert" Common Lisp too much
> during the learning stages is a good one---it's all to easy to waste a
> lot of time as a newbie trying to reshape it into something more
> familiar---once you get the hang of it, its greatest strength is how
> easy it is to pervert. With a little work, you can turn your Lisp
> implementation into a very convenient tool for line-oriented string
> mangling.

That reminds me of a passage from Erik Naggum that I thought was quite
interesting:

If the goal is to make something "work", you grab the tool that is
already almost there and just whip something together to get all
there, and Common Lisp is not "almost there" for a large set of common
tasks today. (Those who want Common Lisp to be "almost there" with a
huge lot of stupid tasks go off to whine about libraries and small
syntax issues and go create Arc or whatever, instead of seeing that
being "almost there" is not an asset at all, because the more "almost
there" you get, the fewer things are within reach at the same
distance, and those things that are much farther away than others are
curiously considered "unsuitable" for the language once it has
succeeded in getting close enough for comfort for other tasks.)

If the goal is to make something of lasting value, which does only
what it is supposed to do and does not fail randomly, but has tightly
controlled and predictable failure modes with graceful recovery, then
most other languages and tools are so lacking in support for those
problems they were not specifically created to handle well, that
_they_ are unsuited for top quality software and can only achieve this
at great expense. This is where I think Common Lisp really excels.

http://groups.google.com/group/comp.lang.lisp/msg/5f18dd6cdc6b39f0 has
the entire message.

Zach

ccc31807

unread,
Jul 15, 2009, 1:10:15 PM7/15/09
to
On Jul 15, 11:38 am, bj...@runa.se (Björn Lindberg) wrote:
> The internal representation of data in your program is governed by
> what kinds of operations you need to perform on that data, not on the
> format of the input data, nor of the output.

Yes, and no. In one view, the internal representation of data is
simply bits of higher voltage or lower voltage, ones or zeros. If you
mean data structures, then obviously you might want to treat small
integers differently from instances of complex objects.

> The format of strings is unstructured text.

Again, it depends. For example, I don't view either of the following
two strings as 'unstructured text' although a machine might view them
as unstructured.
"first","middle","last","address","city","state","zip"
first|middle|last|address|city|state|zip

> Thus, if you need to do any form of structured
> manipulations of the data, you convert it to a suitable format
> first. As an example, a compiler might take strings as inputs, and
> produce binary machine code as output, but it does not do all
> computations internal computations on those two data types.

I agree with this, but one of the nice things about higher level
languages is that you can treat the compiler as a black box, without
regard to what the compiler might do internally. I input a string, the
box outputs a string; I input an s-expression, the box outputs an s-
expression. However the box treats the input internally is no concern
of mine, even though I might imagine that it treats it simply as bits
of either ones or zeros.

> This advice is language independant, in that strings are unstructured
> in any language.

I don't know what you mean by 'unstructured'. The following quote
delimited 'string' may or may not be structured, depending on your
POV:

"<html>
<head>
<title>Structure of Strings</title>
</head>
<body>
<h1>Strings: Structured or Unstructured?</h1>
</body>
</html>"


> Many times you can 'cheat', and use the Lisp reader. It is possible to
> make it read case sensitively, and with reader macros make it read any
> characters specially. If the input format is not at all Lisp though,
> it may be simpler to do what you do in other languages, and read in
> text character by character, or line by line.

Yes. Different tools for different problems. It helps for a learner to
relate what he is learning with his environment, which is what I'm
trying to do.

CC

Pillsy

unread,
Jul 15, 2009, 3:14:34 PM7/15/09
to
On Jul 15, 1:10 pm, ccc31807 <carte...@gmail.com> wrote:
> On Jul 15, 11:38 am, bj...@runa.se (Björn Lindberg) wrote:
>
> > The internal representation of data in your program is governed by
> > what kinds of operations you need to perform on that data, not on the
> > format of the input data, nor of the output.
[...]

> If you
> mean data structures, then obviously you might want to treat small
> integers differently from instances of complex objects.

It's more than that, though. At some point, you're almost certainly
going to want to use the input data to construct a set of data
structures---be they numbers, lists, hash tables, instances objects,
whatever---that actually reflect the nature or the problem you're
trying to solve.

> > The format of strings is unstructured text.

> Again, it depends. For example, I don't view either of the following
> two strings as 'unstructured text' although a machine might view them
> as unstructured.

Well, yeah, but the issue is that to a Lisp implementation, a string
is just a vector of characters. If that's all the structure you need
(sometimes it is--say you wanted something like the 'tr' command from
Unix and, IIRC, Perl), you're set, but 9 times out of 10, you work
with strings like these:

> "first","middle","last","address","city","state","zip"
> first|middle|last|address|city|state|zip

In cases like this, you need to tell Lisp about the structure, just
like you need to tell Perl about the structure using the split()
function. Out of the box, Common Lisp doesn't provide as many
functions as Perl does for telling how the translation works.

There's READ-FROM-STRING, which tells it that the string is a textual
representation of a sexp[1], and there's PARSE-INTEGER, which does
pretty just what it says on the tin, and you can index into the string
or treat it like a list of characters. Beyond that, you can use
libraries to do things like regular expressions or XML parsing or the
like.

Or you can write your own functions and use those.
[...]


> > This advice is language independant, in that strings are unstructured
> > in any language.

> I don't know what you mean by 'unstructured'. The following quote
> delimited 'string' may or may not be structured, depending on your
> POV:

> "<html>
>   <head>
>    <title>Structure of Strings</title>
>   </head>
>   <body>
>    <h1>Strings: Structured or Unstructured?</h1>
>   </body>
>  </html>"

The issue isn't so much what your POV is, but how you turn your POW
into a program that allows the machine to share your POV.

Cheers,
Pillsy

Robert Uhl

unread,
Jul 15, 2009, 5:53:28 PM7/15/09
to
ccc31807 <cart...@gmail.com>
>
>>   I believe it is important to note that Lisp's natural data are
>>   S-expressions (basically, symbols, numbers, and lists of
>>   S-expressions), just like Perl's natural data are strings, (Perl)
>>   lists and (Perl) hash(-tabl)es.  If you manage to get your data to
>>   be S-expressions, you'll have a much easier time then manipulating
>>   them in Lisp programs.
>
> Okay, point taken. When in Rome ...

Yup. In C it's common for data to be binary; in shell and Perl it's
common for it to be strings (which could contain encoded binary data);
in Lisp it's common for it to be symbolic expressions (which could
contain binary and string data).

> Still, in my job data tends to come in as strings as I have noted
> above. I don't a philosophical objection to using s-expressions rather
> than strings, but if my data consists of strings then I still have the
> problem of converting it to lists. Shouldn't be hard to do.

Yup, and once you've done so it can be very pleasant to manipulate data
that way.

> Actually, now that I'm thinking about it, case is significant, and I
> don't know how to convert a string to an s-expression and preserve
> case.

Remember that INTERN is case-preserving, and so is using || to quote a
symbol name.

Also, your s-expressions could contain strings where it makes sense to
do so. Or even other data types.

--
Marmite is a black tarry yeast extract used to encourage New Zealand children
to grow large and strong. The usual method is to feed it to them on toast,
resulting in them growing very quickly, so that they might become big and
strong enough to stop their parents from doing this. --Rupert Boleyn

Robert Uhl

unread,
Jul 15, 2009, 6:02:33 PM7/15/09
to
ccc31807 <cart...@gmail.com> writes:
>
> In my job, I take files (some extremely large) and process them line
> by line. The lines come in as strings, and my strategy is:
> 1. read in the line as a string

READ-LINE

> 2. split the string (usually on whitespace, or comma, or pipe) into
> some kind of list (separate scalare, an array, or hash, or
> combination)

SPLIT-SEQUENCE

> 3. manipulate the individual data values

Lisp

> 4. join the data values into a new string

CONCATENATE

> 5. write out the new line as a string

WRITE-LINE

> All this is very simple and easy to do in Perl, but it seems that Lisp
> isn't designed to do this easily. This isn't a criticism, just an
> observation.

Lisp doesn't come with SPLIT-SEQUENCE out of the box, but fortunately
someone else has written it for you:-)

--
Robert A. Uhl
No, officer, it was strictly a *decorative* beartrap. --Chris Knight

Vassil Nikolov

unread,
Jul 16, 2009, 12:50:14 AM7/16/09
to

On Wed, 15 Jul 2009 16:26:58 +0200, p...@informatimago.com (Pascal J. Bourguignon) said:

> ccc31807 <cart...@gmail.com> writes:
>> On Jul 15, 1:58 am, Vassil Nikolov <vniko...@pobox.com> wrote:

>>> ...


>>> S-expressions (basically, symbols, numbers, and lists of

>>> S-expressions) ...
>> ...

> Vassil spoke to say nothing.

:- )

> A string is a S-expr in lisp.

I thought that by noting in parentheses "basically, symbols,
numbers, and lists of S-expressions" it would be clear what I
referred to by "S-expressions" in that context, but apparently I was
too naive. I shall have to remember to say "classic S-expressions"
or "original S-expressions" next time...

By the way, arrays (including strings), and then structures,
etc. are only technically atoms, owing to an accident of history,
whose fixing had a too low benefit/cost ratio.

Vassil Nikolov

unread,
Jul 16, 2009, 1:32:13 AM7/16/09
to

On Wed, 15 Jul 2009 06:01:59 -0700 (PDT), ccc31807 <cart...@gmail.com> said:
> ...
> In my job, I take files (some extremely large) and process them line
> by line. The lines come in as strings, and my strategy is:
> 1. read in the line as a string
> 2. split the string (usually on whitespace, or comma, or pipe) into
> some kind of list (separate scalare, an array, or hash, or
> combination)
> 3. manipulate the individual data values
> 4. join the data values into a new string
> 5. write out the new line as a string

In my opinion and experience, one candidate approach that is
certainly worth considering is to consume the data by setting up a
custom read table and then making calls to READ.

Purely for the sake of a simple illustration, let's say that (i) the
data in the file consists of IDs, first and last names, and a
number, looking like this:

AB0012|Alice|Brown|17.65
AB0034|Charles|Brown|43.21
AB0056|Ellen|Dillon|43.21
AB0078|Frank|Dillon|98.76
...

and (ii) what we want to do is build an index allowing fast lookup
by first or last name and then by ID, i.e. two mappings such as

Alice -> AB0012
Brown -> AB0012 AB0034
...

and

AB0012 -> Alice Brown 17.65
...

To that end, set up a read table such that a vertical bar is read as
white space (using SET-SYNTAX-FROM-CHAR of #\| and #\Space) and a
new line is read as a right parenthesis [*]. Then calling
READ-DELIMITED-LIST of #\Newline repeatedly will yield, in order,

(AB0012 ALICE BROWN 17.65)
(AB0034 CHARLES BROWN 43.21)
...

and then constructing the lookup tables for the mappings is pretty
straightforward, if not trivial. If writing the names with :CASE
(if we call WRITE, or *PRINT-CASE*) :CAPITALIZE is good enough for
our purposes, we would be in business in about half an hour or so.

A more realistic task would be more involved, of course, but the
above is far from the limits of this approach, too. For example, if
some of the fields must be read as strings, not symbols, then the
vertical bar can be defined as a macro character that arranges the
consumption of the string contents and the construction of the
string object, and then we can call INTERN (or PARSE-INTEGER, or
READ) on it or not depending on the position of the field.

Or have the data produced in CSV format and use a read table ^W^W
library [+] for reading that.

_________
[*] we'll probably want some more tweaks, e.g. to make a quote read
as a constituent character (again with SET-SYNTAX-FROM-CHAR
from, say, #\A) for the sake of names such as O'Hara
[+] search?q=Common+Lisp+library+CSV (say)

Björn Lindberg

unread,
Jul 17, 2009, 4:23:36 AM7/17/09
to
ccc31807 <cart...@gmail.com> writes:

> On Jul 15, 11:38�am, bj...@runa.se (Bj�rn Lindberg) wrote:
>> The internal representation of data in your program is governed by
>> what kinds of operations you need to perform on that data, not on the
>> format of the input data, nor of the output.
>
> Yes, and no. In one view, the internal representation of data is
> simply bits of higher voltage or lower voltage, ones or zeros. If you
> mean data structures, then obviously you might want to treat small
> integers differently from instances of complex objects.

Ask yourself which one of these two views is most relevant to how you
write your program. Then ask yourself why you brought up the other
view at all.

>> Thus, if you need to do any form of structured
>> manipulations of the data, you convert it to a suitable format
>> first. As an example, a compiler might take strings as inputs, and
>> produce binary machine code as output, but it does not do all
>> computations internal computations on those two data types.
>
> I agree with this, but one of the nice things about higher level
> languages is that you can treat the compiler as a black box, without
> regard to what the compiler might do internally. I input a string, the
> box outputs a string; I input an s-expression, the box outputs an s-
> expression. However the box treats the input internally is no concern
> of mine, even though I might imagine that it treats it simply as bits
> of either ones or zeros.

The comparison is between the compiler as a program and *your*
program. Regarding the compiler as a black box in this context does
not help understanding.

>> This advice is language independant, in that strings are unstructured
>> in any language.
>
> I don't know what you mean by 'unstructured'.

In most languages, in particular in Common Lisp, the structure of a
string is a sequence of characters. Regardless of which specific
characters are in the string.

> The following quote
> delimited 'string' may or may not be structured, depending on your
> POV:
>
> "<html>
> <head>
> <title>Structure of Strings</title>
> </head>
> <body>
> <h1>Strings: Structured or Unstructured?</h1>
> </body>
> </html>"

No, the structure of that string is a sequence of characters. Programs
exist to read in that string and turn it into a structured
representation, which might be a hierarchy of class instances, or
lists of lists of strings or atoms. This is the point.

>> Many times you can 'cheat', and use the Lisp reader. It is possible to
>> make it read case sensitively, and with reader macros make it read any
>> characters specially. If the input format is not at all Lisp though,
>> it may be simpler to do what you do in other languages, and read in
>> text character by character, or line by line.
>
> Yes. Different tools for different problems. It helps for a learner to
> relate what he is learning with his environment, which is what I'm
> trying to do.

You mentioned that you deal with database output. Some SQL database
libraries for Common Lisp will return the result of a query as a list
of lists of the fields returned, and do data type conversion to basic
Lisp types:

(("John" "Doe" 28)
("Carter" "Ccc" 31807)
...)


Bj�rn Lindberg

Pascal J. Bourguignon

unread,
Jul 17, 2009, 5:22:20 AM7/17/09
to
bj...@runa.se (Bj�rn Lindberg) writes:

> ccc31807 <cart...@gmail.com> writes:
>
>> On Jul 15, 11:38�am, bj...@runa.se (Bj�rn Lindberg) wrote:
>>> This advice is language independant, in that strings are unstructured
>>> in any language.
>>
>> I don't know what you mean by 'unstructured'.
>
> In most languages, in particular in Common Lisp, the structure of a
> string is a sequence of characters. Regardless of which specific
> characters are in the string.

And in addition, if you only have strings, you may apply structure on
them as demonstrated in my other answer,
Message-ID: <7c7hyao...@pbourguignon.anevia.com>

--
__Pascal Bourguignon__

w_a_x_man

unread,
Jul 29, 2009, 3:13:31 PM7/29/09
to
On Jul 14, 11:28 am, Giorgos Keramidas <keram...@ceid.upatras.gr>
wrote:

> You can upcase and intern them:
>
>   * (mapcar (lambda (name)
>               (intern (string-upcase name)))
>             (list "foo" "bar"))
>   (FOO BAR)

irb(main):006:0> ['foo','bar'].map{|s| s.upcase.to_sym}
=> [:FOO, :BAR]

--
Common Lisp is a significantly ugly language. --- Dick Gabriel
The good news is, it's not Lisp that sucks, but Common Lisp.
--- Paul Graham
Common LISP is the PL/I of Lisps. --- Jeffrey M. Jacobs

w_a_x_man

unread,
Jul 29, 2009, 3:50:15 PM7/29/09
to
On Jul 14, 5:01 pm, Joshua Taylor <tay...@cs.rpi.edu> wrote:

> Now, looking at that problem, the task is to find books whose titles
> contain all of a given /set/ of words, i.e., whose titles are a
> /superset/ of the query words.  The reason that the task is somewhat
> complicated by using strings is that now the title is a sequences of the
> /characters/ that make up the title rather than a sequence of /words/ in
> the title.  But even so, the problem asks for a function that finds all
> the books whose title includes all the elements in the query, but not
> necessarily in the same /order/ as in the query.

$titles =
"
Figures of Stone
The Case of Charles Dexter Ward
The Burning Court
Traitor to the Living
The Stone God Awakens
The Shadow over Insmouth
".strip.split( /\s*\n/ ).map{|s| s.split }


def find words
$titles.select{|twords|
words.all?{|word| twords.include?( word ) } }.
map{|a| a.join " " }
end

p find( ["Stone"] )
p find( ["Living", "Traitor"] )

=== output ===

["Figures of Stone", "The Stone God Awakens"]
["Traitor to the Living"]

Pascal J. Bourguignon

unread,
Jul 29, 2009, 6:30:53 PM7/29/09
to
w_a_x_man <w_a_...@yahoo.com> writes:

> On Jul 14, 5:01�pm, Joshua Taylor <tay...@cs.rpi.edu> wrote:
>
>> Now, looking at that problem, the task is to find books whose titles
>> contain all of a given /set/ of words, i.e., whose titles are a
>> /superset/ of the query words. �The reason that the task is somewhat
>> complicated by using strings is that now the title is a sequences of the
>> /characters/ that make up the title rather than a sequence of /words/ in
>> the title. �But even so, the problem asks for a function that finds all
>> the books whose title includes all the elements in the query, but not
>> necessarily in the same /order/ as in the query.
>
> $titles =

> [...]

The fact is that the traffic at clr is mostly dull... I understand
that one comes here to seek inspiration.

--
__Pascal Bourguignon__

Giorgos Keramidas

unread,
Jul 29, 2009, 6:23:45 PM7/29/09
to
On Wed, 29 Jul 2009 12:13:31 -0700 (PDT), w_a_x_man <w_a_...@yahoo.com> wrote:
> On Jul 14, 11:28�am, Giorgos Keramidas <keram...@ceid.upatras.gr>
> wrote:
>
>> You can upcase and intern them:
>>
>> � * (mapcar (lambda (name)
>> � � � � � � � (intern (string-upcase name)))
>> � � � � � � (list "foo" "bar"))
>> � (FOO BAR)
>
> irb(main):006:0> ['foo','bar'].map{|s| s.upcase.to_sym}
> => [:FOO, :BAR]

Yeah, that was more readable than the Lisp code!

We could also start writing bzip2 compressed code manually. That would
reduce the disk space we need for all that source code and save a huge
amount of pennies.

Get real :P

David Greene

unread,
Jul 30, 2009, 2:15:12 PM7/30/09
to
"ccc31807" <cart...@gmail.com> wrote in message
news:389503e4-eb6a-48c2...@h30g2000vbr.googlegroups.com...
> In CL, if you have a list which you can then turn into a string like
> this:
>> (setf my-list '(this is a list)) ;make a list
>> (setf my-string (write-to-string my-list)) ;convert the list to
>> a string
>
> In Perl, if I wanted to go the other way, I'd write:
>> my $str = 'This is a string'; #make a string
>> my $list = split $str; #convert the string to a list

>
> How do you convert a string to a list in CL? Is there a CL equivalent
> to Perl's split() function, or Java's StringTokenizer class?
>
> Thanks, CC.

I'm a lisp newbie... and it may well be that I misinterpeted your question
but,

(setf x (coerce "hello" 'list))

yields this list

(#\h #\e #\l #\l #\o)

with all the hubub in the answers you are getting I'm probably wrong.


Paul Donnelly

unread,
Jul 30, 2009, 10:28:57 PM7/30/09
to
"David Greene" <Da...@NoWhere.com> writes:

I think he was going for something more like:

(split "hello world") => ("hello" "world")

Bata

unread,
Aug 1, 2009, 10:32:11 AM8/1/09
to
David Greene wrote:
> "ccc31807" <cart...@gmail.com> wrote in message
> news:389503e4-eb6a-48c2...@h30g2000vbr.googlegroups.com...
>> In CL, if you have a list which you can then turn into a string like
>> this:
>>> (setf my-list '(this is a list)) ;make a list
>>> (setf my-string (write-to-string my-list)) ;convert the list to
>>> a string
>> In Perl, if I wanted to go the other way, I'd write:
>>> my $str = 'This is a string'; #make a string
>>> my $list = split $str; #convert the string to a list
>> How do you convert a string to a list in CL? Is there a CL equivalent
>> to Perl's split() function, or Java's StringTokenizer class?
>>
>> Thanks, CC.
Use (cl-ppcre:split " " string-name) to get a list of the different
words in the string.

Check out the cl-ppcre package's functionality, its quite amazing.

w_a_x_man

unread,
Aug 1, 2009, 8:30:20 PM8/1/09
to
On Aug 1, 9:32 am, Bata <batabo...@yahoo.ca> wrote:
> David Greene wrote:
> > "ccc31807" <carte...@gmail.com> wrote in message

Ruby:

string_name.split

Pascal J. Bourguignon

unread,
Aug 1, 2009, 11:55:06 PM8/1/09
to
w_a_x_man <w_a_...@yahoo.com> writes:

Don't you realize how ugly Ruby syntax is?

Here is in 150 lines of lisp, a simplified lisp reader that is able to
read all the lisp syntax needed to write it.

Try to parse Ruby syntax in Ruby and see how useless a language it is.

-----(simple-reader.lisp)------------------------------------------------------
;;;; -*- mode:lisp;coding:utf-8 -*-
;;;;**************************************************************************
;;;;FILE: simple-reader.lisp
;;;;LANGUAGE: Common-Lisp
;;;;SYSTEM: Common-Lisp
;;;;USER-INTERFACE: NONE
;;;;DESCRIPTION
;;;;
;;;; Simple Lisp Reader.
;;;; This reader implements a subset of the Common Lisp reader,
;;;; but it should be extensible enough to be able to read most
;;;; of Common Lisp syntax.
;;;; Not supported: the preserving whitespace flag, the recursive flag
;;;; (therefore no references), character traits (escapes).
;;;; Otherwise, reader macros and dispatching macros can be written
;;;; to read most of CL syntax.
;;;; Only integers, keywords and symbol tokens are parsed (a more
;;;; sophisticated parse-token function can be configured).
;;;;
;;;;AUTHORS
;;;; <PJB> Pascal J. Bourguignon <p...@informatimago.com>
;;;;MODIFICATIONS
;;;; 2009-08-02 <PJB> Created.
;;;;BUGS
;;;;LEGAL
;;;; GPL
;;;;
;;;; Copyright Pascal J. Bourguignon 2009 - 2009
;;;;
;;;; This program is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU General Public License
;;;; as published by the Free Software Foundation; either version
;;;; 2 of the License, or (at your option) any later version.
;;;;
;;;; This program is distributed in the hope that it will be
;;;; useful, but WITHOUT ANY WARRANTY; without even the implied
;;;; warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
;;;; PURPOSE. See the GNU General Public License for more details.
;;;;
;;;; You should have received a copy of the GNU General Public
;;;; License along with this program; if not, write to the Free
;;;; Software Foundation, Inc., 59 Temple Place, Suite 330,
;;;; Boston, MA 02111-1307 USA
;;;;**************************************************************************

(defstruct character-description
non-terminating-p
reader-macro
dispatching-macro-characters)

(defstruct (simple-readtable (:constructor %make-simple-readtable))
(default-character-description (make-character-description))
(macro-characters (make-hash-table))
(parse-token (function identity))
(whitespaces #(#\space #\tab #\newline #\linefeed #\return #\page #\vt)))

(defun simple-get-macro-character (character &optional (readtable *simple-readtable*))
(let ((description (or (gethash character (simple-readtable-macro-characters readtable))
(simple-readtable-default-character-description readtable))))
(values (character-description-reader-macro description)
(character-description-non-terminating-p description))))

(defun simple-set-macro-character (character function &optional non-terminating-p
(readtable *simple-readtable*))
(setf (gethash character (simple-readtable-macro-characters readtable))
(make-character-description :non-terminating-p non-terminating-p
:reader-macro function))
't)

(defun simple-get-dispatch-macro-character (character subchar
&optional (readtable *simple-readtable*))
(setf subchar (char-upcase subchar))
(let ((description (gethash character (simple-readtable-macro-characters readtable))))
(unless (and description
(character-description-dispatching-macro-characters description))
(error "#\\~C is not a dispatching macro character" character))
(gethash subchar (character-description-dispatching-macro-characters description))))

(defun simple-set-dispatch-macro-character (character subchar function
&optional (readtable *simple-readtable*))
(setf subchar (char-upcase subchar))
(let ((description (gethash character (simple-readtable-macro-characters readtable))))
(when (or (null description)
(null (character-description-dispatching-macro-characters description)))
(setf (gethash character (simple-readtable-macro-characters readtable))
(make-character-description :non-terminating-p t
:reader-macro (function simple-reader-dispatching-macro)
:dispatching-macro-characters (make-hash-table)))))
(let ((description (gethash character (simple-readtable-macro-characters readtable))))
(setf (gethash subchar (character-description-dispatching-macro-characters description))
function))
't)

(defun simple-reader-dispatching-macro (character stream)
(let* ((subchar (read-char stream))
(macro (simple-get-dispatch-macro-character character subchar)))
(assert macro () "~C~C is not a dispatching macro" character subchar)
(funcall macro character subchar stream)))

(defun simple-read-vector-dmacro (char subchar stream)
(declare (ignore char))
(let ((contents (simple-read-list-macro subchar stream)))
(coerce contents 'vector)))

(defvar *character-names* (list (cons "SPACE" (code-char 32))
(cons "NEWLINE" (code-char 10))
(cons "RETURN" (code-char 13))
(cons "PAGE" (code-char 12))
(cons "VT" (code-char 11))
(cons "LINEFEED" (code-char 10))
(cons "TAB" (code-char 9)))
"An a-list mapping character names to characters.")

(defun simple-read-character-dmacro (char subchar stream)
(declare (ignore char subchar))
(let ((object (read-char stream)))
(if (alpha-char-p (peek-char nil stream nil #\space))
(loop
:with buffer = (make-array 8 :element-type 'character :adjustable t :fill-pointer 1
:initial-element object)
:do (vector-push-extend (read-char stream) buffer)
:while (alpha-char-p (peek-char nil stream nil #\space))
:finally (return (or (cdr (assoc (string-upcase buffer) *character-names*
:test (function string=)))
(aref buffer 0))))
object)))

(defun simple-parse-token (buffer)
;; We only deal with integers, keywords and symbols.
(or (ignore-errors (parse-integer buffer :junk-allowed nil))
(if (char= #\: (aref buffer 0))
(intern (string-upcase (subseq buffer (position #\: buffer :test (function char/=))))
"KEYWORD")
;; We don't deal with other packages in this simple parse-token.
(intern (string-upcase buffer)))))

(defun simple-read-quote-macro (character stream)
(declare (ignore character))
(list 'quote (simple-read stream)))

(defun simple-read-list-macro (character stream)
(declare (ignore character))
(loop
:until (char= #\) (peek-char t stream))
:collect (simple-read stream)
:finally (read-char stream)))

(defun simple-read-string-macro (character stream)
(handler-case
(loop
:with buffer = (make-array 8 :element-type 'character :adjustable t :fill-pointer 0)
:for ch = (read-char stream)
:until (char= character ch)
:do (vector-push-extend (if (char= #\\ ch)
(read-char stream)
ch)
buffer)
:finally (return (copy-seq buffer)))))

(defun simple-read-comment-macro (character stream)
(declare (ignore character))
(read-line stream)
(values))

(defun make-simple-readtable ()
(let ((readtable
(%make-simple-readtable
:default-character-description (make-character-description :non-terminating-p t)
:parse-token (function simple-parse-token))))
(simple-set-macro-character #\( (function simple-read-list-macro) nil readtable)
(simple-set-macro-character #\) nil nil readtable)
(simple-set-macro-character #\' (function simple-read-quote-macro) nil readtable)
(simple-set-macro-character #\" (function simple-read-string-macro) nil readtable)
(simple-set-macro-character #\; (function simple-read-comment-macro) nil readtable)
(simple-set-dispatch-macro-character #\# #\\ (function simple-read-character-dmacro) readtable)
(simple-set-dispatch-macro-character #\# #\( (function simple-read-vector-dmacro) readtable)
readtable))

(defparameter *simple-readtable* (make-simple-readtable))

(defun simple-read (&optional (stream *standard-input*) (eof-error-p t) eof-value)
(peek-char t stream nil)
(let ((char (read-char stream nil nil)))
(cond
(char
(multiple-value-bind (macro non-terminating-p) (simple-get-macro-character char)
(if macro
(let ((object (multiple-value-list (funcall macro char stream))))
(if object (first object) (simple-read stream eof-error-p eof-value)))
(loop
:with buffer = (make-array 8 :element-type 'character :adjustable t :fill-pointer 0)
:for ch = (peek-char nil stream nil #\space)
:initially (vector-push-extend char buffer)
:until (or (position ch (simple-readtable-whitespaces *simple-readtable*))
(not (nth-value 1 (simple-get-macro-character ch))))
:do (vector-push-extend (read-char stream eof-error-p eof-value) buffer)
:finally (return (funcall (simple-readtable-parse-token *simple-readtable*)
buffer))))))
(eof-error-p (error 'end-of-file :stream stream))
(t eof-value))))

--------------------------------------------------------------------------------

(with-open-file (input "simple-reader.lisp")
(loop
:for sexp = (simple-read input nil input)
:until (eq sexp input)
:do (pprint sexp) (terpri)))


--
__Pascal Bourguignon__

fft1976

unread,
Aug 2, 2009, 1:26:03 AM8/2/09
to
On Aug 1, 8:55 pm, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

> w_a_x_man <w_a_x_...@yahoo.com> writes:
> > On Aug 1, 9:32 am, Bata <batabo...@yahoo.ca> wrote:
> >> Use (cl-ppcre:split " " string-name) to get a list of the different
> >> words in the string.
>
> >> Check out the cl-ppcre package's functionality, its quite amazing.
>
> > Ruby:
>
> > string_name.split
>
> Don't you realize how ugly Ruby syntax is?
>
> Here is in 150 lines of lisp, a simplified lisp reader that is able to
> read all the lisp syntax needed to write it.
>
> Try to parse Ruby syntax in Ruby and see how useless a language it is.

I much prefer CL to Ruby, but this argument is useless. How many lines
of code do you need to read Brainfuck in Brainfuck?

(If you wanted to show the inferiority of Ruby to Common Lisp, the
shootout makes a much more compelling argument)

ACL

unread,
Aug 2, 2009, 1:33:03 AM8/2/09
to

CL:
(split string_name)

lol.

I used to get kind of pissed off that you spammed c.l.l. with things
that are so completely inappropriate, but in reality it is pathetic.

fft1976

unread,
Aug 3, 2009, 12:03:04 AM8/3/09
to

By the way, here is in 1 line of BF, a complete BF reader that is able
to
read all the BF syntax needed to write it:

,+[-.,+]

Here's how to try it:

$ sudo apt-get install bf
$ cat > reader.bf
,+[-.,+]
$ bf reader.bf < reader.bf

Your 150 lines don't look very impressive now, do they?

Ruby < Lisp <<< BF!


Pascal J. Bourguignon

unread,
Aug 3, 2009, 4:19:51 AM8/3/09
to
fft1976 <fft...@gmail.com> writes:
> By the way, here is in 1 line of BF, a complete BF reader that is able
> to
> read all the BF syntax needed to write it:
>
> ,+[-.,+]
>
> Here's how to try it:
>
> $ sudo apt-get install bf
> $ cat > reader.bf
> ,+[-.,+]
> $ bf reader.bf < reader.bf
>
> Your 150 lines don't look very impressive now, do they?
>
> Ruby < Lisp <<< BF!

I specified a syntactic reader. Not just a reader. READ-SEQUENCE, or
a loop on READ-CHAR is trivial both in Ruby and in Lisp.

Building a data structure isomorphe to the syntax of the language is
less trivial. First you will have to think about how to build an
abstract data structure in BF. Have fun!

--
__Pascal Bourguignon__

fft1976

unread,
Aug 3, 2009, 10:51:36 PM8/3/09
to
On Aug 3, 1:19 am, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

> fft1976 <fft1...@gmail.com> writes:
> > By the way, here is in 1 line of BF, a complete BF reader that is able
> > to
> > read all the BF syntax needed to write it:
>
> > ,+[-.,+]
>
> > Here's how to try it:
>
> > $ sudo apt-get install bf
> > $ cat > reader.bf
> > ,+[-.,+]
> > $ bf reader.bf < reader.bf
>
> > Your 150 lines don't look very impressive now, do they?
>
> > Ruby < Lisp <<< BF!
>
> I specified a syntactic reader.  Not just a reader.

It is a syntactic reader. BF's syntax is just a sequence of
characters. If you throw in illegal characters, the behavior is
"undefined". Lisp's syntax is more complicated: it's a tree of
identifiers (in its idealized form; of course, Common Lisp had to fuck
it up). Ruby's and Python's syntaxes are even more complicated.

The above was to illustrate the wrongness of your argument that the
length of a self-parser determines the usefulness of the language.
Hell, I know that BF can be a little *too* awesome.

By the way, Python's syntax is much better than Ruby's. Dollar signs
in front of variables? WTF were the designers smoking? That's like
Perl! Haven't you learned your lesson?

Python's syntax might even be better than Lisp's, but it's certainly
harder to parse.

Carl Banks

unread,
Aug 3, 2009, 11:02:26 PM8/3/09
to
> harder to parse.- Hide quoted text -


Go away, troll.

[This is cross-posted; I recommend that no one else follow up.]


Carl Banks

fft1976

unread,
Aug 3, 2009, 11:06:58 PM8/3/09
to

Lispers were having fun badmouthing other languages for no good
reason:

"""
Don't you realize how ugly Ruby syntax is?

Here is in 150 lines of lisp, a simplified lisp reader that is able to
read all the lisp syntax needed to write it.

Try to parse Ruby syntax in Ruby and see how useless a language it
is.
"""

http://groups.google.com/group/comp.lang.lisp/msg/52dde974d504ad54

Of course you don't like it when I point out just how wrong you are.

alex23

unread,
Aug 4, 2009, 2:01:21 AM8/4/09
to
On Aug 4, 1:06 pm, fft1976 <fft1...@gmail.com> wrote:
> Of course you don't like it when I point out just how wrong you are.

No, we don't like it when you try to drag comp.lang.python into
whatever the hell it is you think you're doing.

Pascal J. Bourguignon

unread,
Aug 4, 2009, 1:32:17 PM8/4/09
to
fft1976 <fft...@gmail.com> writes:

We were comparing Ruby and Lisp. BF has nothing to do here.

--
__Pascal Bourguignon__

WJ

unread,
May 26, 2011, 4:23:52 AM5/26/11
to
Joshua Taylor wrote:

> ccc31807 wrote:
> > I working through Winston and Horn, 3rd edition. Doing Problem 6-1, I
> > am using strings rather than lists to represent data, as strings are
> > much more natural for me. (By day, I'm a database manager and data
> > munger and I work with strings a lot.)
> >
> > Problem 6-1 requires a search function for titles of books by literal
> > text. With the title as a string, I can do this using (search), but
> > the problem is that ALL letters match, so for "Moby Dick" 'Moby'
> > matches, as well as 'oby' as well as 'by' as well as 'y'. I looked at
> > CL-PPCRE and am satisfied that it will do what I want, but I was
> > curious whether Lisp could do the same thing as Perl in this context.


>
> Now, looking at that problem, the task is to find books whose titles

> contain all of a given set of words, i.e., whose titles are a superset of


> the query words. The reason that the task is somewhat complicated by using

> strings is that now the title is a sequences of the characters that make up
> the title rather than a sequence of words in the title. But even so, the


> problem asks for a function that finds all the books whose title includes

> all the elements in the query, but not necessarily in the same order as in
> the query. Consider the examples from the textbook:
>
> * (find-book-by-title-words '(black orchid) books)
> ((TITLE (THE BLACK ORCHID))
> (AUTHOR (REX STOUT))
> (CLASSIFICATION (FICTION MYSTERY)))
>
> * (find-book-by-title-words '(orchid black) books)
> ((TITLE (THE BLACK ORCHID))
> (AUTHOR (REX STOUT))
> (CLASSIFICATION (FICTION MYSTERY)))

Arc:

(= titles (map tokens (tokens


"Figures of Stone
The Case of Charles Dexter Ward
The Burning Court
Traitor to the Living
The Stone God Awakens
The Shadow over Insmouth"

#\newline)))

(def find-titles (words)
(map [string (intersperse " " _)]
(keep [is _ (union is _ words)] titles)))

(find-titles '("Stone"))
==> ("Figures of Stone" "The Stone God Awakens")

(find-titles '("Living" "Traitor"))
==> ("Traitor to the Living")

Reply all
Reply to author
Forward
0 new messages