Newbie Help Please: Reading into a list from a file

Robert Monfera

unread,

Apr 28, 1999, 3:00:00 AM4/28/99

to

Hi there,

Kent M Pitman wrote:
...
> Solution depends a lot on how the words are separated.

Speaking of file formats, tab-delimited text is a very common one (where
spaces are part of fields), and there are a bunch of others. What is
the common practice here? It's very easy to quickly put together some
code, I am just wondering if people prefer do this, or use some public
interface libraries that maybe cover multiple formats such as .csv, .dbf
or .wk1 in the spirit of reuse.

Regards
Robert

Kent M Pitman

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

sd...@sdgs.com (Randy) writes:

> Hi all, I'm having some trouble reading from a file.
> The text file is in the form:
>
> Cat.
> Bird.
> Dog.
>
> I need to do the equivalent of (setq animals '(Cat. Bird. Dog.)) but I
> need to read the elements of the list (however many there are in the
> file) from the text file.

I'm assuming this is not a homework problem. (No one I know ever assigns
anything useful like file I/O for homework. Sigh.)

Solution depends a lot on how the words are separated.

READ reads lisp expressions, and "Cat.", etc. are technically
lisp expressions. To retain case but still use READ,
you have to use an appropriate readtable.

READ-LINE will read lines of text. The result is a string, which I
would think would be better than a symbol. I can't really seriously
believe you want symbols with dots in their names, but it is a possible
thing. See the function INTERN if you want to convert a string to
a symbol.

(defun read-the-file (filename)
(with-open-file (stream filename)
(loop for line = (read-line stream nil nil)
while line
collect line)))

will return ("Cat." "Bird." "Dog.")
If instead you use (read stream nil nil), you'll get
(CAT. BIRD. DOG.)
If you use (intern (read-line stream nil nil)) you'll get
(|Cat.| |Bird.| |Dog.|)
You could also write your own reader to deal with custom separator chars and
return value type. For example:

(defun whitespace? (ch)
(or (eql ch #\Space)
(eql ch #\Tab)
(eql ch #\Newline)))

(defun peek-char-after-whitespace (stream)
(loop for ch = (read-char stream nil nil)
while ch
when (not (whitespace? stream))
do (return ch)))

(defun read-word (stream)
(let ((ch (peek-char-after-whitespace stream)))
(when ch
(intern (with-output-to-string (str)
(write-char ch str)
(loop for ch = (read-char stream nil nil)
while (and ch (not (whitespace? ch)))
do (write-char ch str)))))))

Then if you use (read-word stream) instead of the (read-line stream nil nil)
you will end up able to have Cat. and Bird. and Dog. all on one line with
only whitespace between. You also have better control over what happens
if you do "Cat, Dog, etc." since "," is a character that Lisp doesn't want
to see in places that English likes it to be.

I only did very cursory testing on the above, so it's possible I goofed
somewhere, but it should be close. For doc on how the various
operators involved work, see the Common Lisp HyperSpec at
http://www.harlequin.com/education/books/HyperSpec/FrontMatter/index.html

Kent M Pitman

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

Robert Monfera <mon...@fisec.com> writes:

> Hi there,
>
> Kent M Pitman wrote:
> ...

> > Solution depends a lot on how the words are separated.
>

> Speaking of file formats, tab-delimited text is a very common one (where
> spaces are part of fields), and there are a bunch of others. What is
> the common practice here? It's very easy to quickly put together some
> code, I am just wondering if people prefer do this, or use some public
> interface libraries that maybe cover multiple formats such as .csv, .dbf
> or .wk1 in the spirit of reuse.

I don't personally know of a library that does this, but there may
be one. You could poke around at the ALU's interim web site.
http://www.elwoodcorp.com/alu/

The thing is, though, it's so completely trivial to write that many
people probably don't include a library just because finding the name
of library name to use could take about as long as writing the 10 lines
of code. I don't mean to attach a value judgment to that; I'm all for
having shared libraries. But as a practical matter, people do resist
writing them when the amount of work they save is relatively small.

Tim Bradshaw

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

* Robert Monfera wrote:

> Speaking of file formats, tab-delimited text is a very common one (where
> spaces are part of fields), and there are a bunch of others. What is
> the common practice here? It's very easy to quickly put together some
> code, I am just wondering if people prefer do this, or use some public
> interface libraries that maybe cover multiple formats such as .csv, .dbf
> or .wk1 in the spirit of reuse.

This is heresy of the worst kind, but when I have to do this I use
the normal string-bashing tools -- some combination of awk, sed, perl
and other normal Unix stuff -- to read the format and spit out
something Lisp can read easily. That lets me do the interesting bit
in Lisp and the boring bit in tools better suited to boring problems.

I'm reassured by the fact that people I know who do really serious
data-mashing stuff in C *also* use this technique (perl for input
processing basically).

--tim

Juanma Barranquero

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 29 Apr 1999 02:59:11 GMT, Kent M Pitman <pit...@world.std.com>
wrote:

>For doc on how the various operators involved work, see the Common
>Lisp HyperSpec at

>www.harlequin.com/education/books/HyperSpec/FrontMatter/index.html

Is someone still maintaining it, correcting typos, etc.?

/L/e/k/t/u

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.0.2i

iQA/AwUBNygYSP4C0a0jUw5YEQKfNwCggWrDBVW20a1KYrQUYDFYcVZ+ddEAoLQl
t10X2cDB8mcYzTb3teqTEdYb
=6q7C
-----END PGP SIGNATURE-----

Bernhard Pfahringer

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

In article <sfw3e1k...@world.std.com>,

Kent M Pitman <pit...@world.std.com> wrote:

>Robert Monfera <mon...@fisec.com> writes:
>
>> Hi there,
>>
>> Kent M Pitman wrote:
>> ...
>> > Solution depends a lot on how the words are separated.
>>

>> Speaking of file formats, tab-delimited text is a very common one (where
>> spaces are part of fields), and there are a bunch of others. What is
>> the common practice here? It's very easy to quickly put together some
>> code, I am just wondering if people prefer do this, or use some public
>> interface libraries that maybe cover multiple formats such as .csv, .dbf
>> or .wk1 in the spirit of reuse.
>

>I don't personally know of a library that does this, but there may
>be one. You could poke around at the ALU's interim web site.
> http://www.elwoodcorp.com/alu/
>
>The thing is, though, it's so completely trivial to write that many
>people probably don't include a library just because finding the name
>of library name to use could take about as long as writing the 10 lines
>of code. I don't mean to attach a value judgment to that; I'm all for
>having shared libraries. But as a practical matter, people do resist
>writing them when the amount of work they save is relatively small.
>

You might find 'split-sequence' useful. The implementation given below was
co-evolved in this newsgroup half a year ago:

;;; full-fledged version ala position
(defun split-sequence (delimiter seq
&key
(empty-marker nil keep-empty-subseqs)
(from-end nil)
(start 0)
(end nil)
(test nil test-supplied)
(test-not nil test-not-supplied)
(key nil key-supplied)
&aux
(len (length seq)))

"Return list of subsequences in SEQ delimited by DELIMITER.
If an EMPTY-MARKER is supplied, empty subsequences will be
represented by EMPTY-MARKER, otherwise they will be discarded.
All other keywords work analogously to POSITION."

(unless end (setq end len))

(when from-end
(setf seq (reverse seq))
(psetf start (- len end)
end (- len start)))

(loop with other-keys = (nconc (when test-supplied (list :test test))
(when test-not-supplied (list :test-not test-not))
(when key-supplied (list :key key)))
for left = start then (+ right 1)
for right = (min (or (apply #'position delimiter seq :start left other-keys)
len)
end)
if (< left right)
collect (subseq seq left right)
else when keep-empty-subseqs collect empty-marker
until (eq right end)))

Splitting tab-delimited strings then is just:

USER(13): (split-sequence #\tab (coerce '(#\a #\tab #\space #\b) 'string))
("a" " b")

You can even abuse the :test keyword to deal with the original example:

USER(14): (split-sequence '(#\space #\tab #\newline) "Cat.
Bird.
Dog.
"
:test #'(lambda (l x)(member x l)))

("Cat." "Bird." "Dog.")

That way one could read in the complete file at once into a string (using READ-SEQUENCE)
and do all the parsing in Lisp.

cheers, Bernhard
--
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for http://www.ai.univie.ac.at/~bernhard/
Artificial Intelligence bern...@ai.univie.ac.at

sta...@hacked.ncsu.edu

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

Tim Bradshaw <t...@tfeb.org> writes:

> * Robert Monfera wrote:
> > Speaking of file formats, tab-delimited text is a very common one (where
> > spaces are part of fields), and there are a bunch of others. What is
> > the common practice here? It's very easy to quickly put together some
> > code, I am just wondering if people prefer do this, or use some public
> > interface libraries that maybe cover multiple formats such as .csv, .dbf
> > or .wk1 in the spirit of reuse.
>

> This is heresy of the worst kind, but when I have to do this I use
> the normal string-bashing tools -- some combination of awk, sed, perl
> and other normal Unix stuff -- to read the format and spit out
> something Lisp can read easily. That lets me do the interesting bit
> in Lisp and the boring bit in tools better suited to boring problems.

For a while I was exchanging numerical data files a lot between Clasp
(a Lisp stat package) and other applications, and I settled on the
fairly useful hack of putting a list of numbers on every line, with
tabs between all of the numbers *and* between the open paren and the
first number and the last number and the close paren:

( 1 2 3 )

This let my Lisp program read things in normally, and just created a
couple of garbage columns in other stat packages I was using.

--
Rob St. Amant

Kent M Pitman

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

barra...@laley-actualidad.es (Juanma Barranquero) writes:

> On Thu, 29 Apr 1999 02:59:11 GMT, Kent M Pitman <pit...@world.std.com>
> wrote:
>
> >For doc on how the various operators involved work, see the Common
> >Lisp HyperSpec at
> >www.harlequin.com/education/books/HyperSpec/FrontMatter/index.html
>
> Is someone still maintaining it, correcting typos, etc.?

This is a popular question. The answer is a good deal more complicated
than you probably expected. Here goes...

I am going to answer for what I know, but you should keep in mind that
I don't speak for Harlequin, who claim the name Common Lisp HyperSpec
as a trademark, and who own copyright in the hypertext markup. (The
underlying text copyright ownership is an issue I'll speak to people
about privately if they approach me about it, but I try not to comment
about in public.) If your question is one of corporate policy of the
document owner, you must ask Harlequin. Information by me below should
be regarded as purely anecdotal, historical, trivia, and the like:

Even when I was at Harlequin, no one was "maintaining it and
correcting typos" in the sense that you probably mean. That is, the
typos are largely in the underlying ANSI CL spec, not in the hypertext
layer of the document. (I was and am redirecting typos reported about
CLHS as implicit requests for J13 to do something, but that's a
separate matter.) It was/is important to the integrity of the
document that the hypertext be precisely what is in the ANSI CL
hardcopy. Once you fix typos, a divergence arises, and some such
divergences could create material disputes over meaning. I and others
wanted to avoid that where possible. True typos are things you can read
past; if they are "typos that matter" one must be very wary of fixing
them quietly. And historical documents are historical documents; one
doesn't update spellings in the Declaration of Independence (or whatever
your country's equivalent of that might be :-).

ANSI CL is still maintained through the ANSI process (NCITS committee
J13, formerly known as X3J13). I and others will continue to be doing
that, but that's a long-arc timeline between updates. A J13 meeting
is coming up, though.

Back to CLHS, as I said, its status is something you could approach
Harlequin to ask about, since that particular hypertextification item
is copyrighted by them. There was some talk of having me continue to
maintain it, but it was left in limbo for various reasons I'm going to
try not to go into here. [Bottom line: if they want me to do it, they
need to contact me and talk to me about the terms under which that
might be done. They should not think they are waiting for me to
contact them. If I were to decide to do something new, it would
probably be to start over from the public TeX sources and write
all-new code to do the conversion so that the result was mine to
control and I didn't have to risk later having to again ask someone
else's permission for the right to update something that came from the
sweat of my own brow, as it were. I'm not necessarily likely to mount
such an effort, especially absent funding to do so, but that would be
what I would be inclined to do if I did get the urge, I guess is what
I'm saying.]

At any rate, the virtue of CL qua language is its stability, so the
fact that documents about it don't change regularly is not an
automatic thing to panic about.

Little known CLHS versioning trivia:

Last I checked, the main version of CLHS that Harlequin distributes
is version 4. Versions 1 and 2 were internal only; you never saw
them unless you worked at Harlequin. Version 3 was the initial
rollout; most people probably have that. You can find the version
identifier in the HTML source code of every page. I recommend that
you do NOT race to replace v3 with v4; the *only* change is a
one-word legally required change in a trademark claim to claim
"Liquid Common Lisp" instead of "Lucid Common Lisp". It's not worth
downloading a whole new copy for that.

There is a version 5 in existence, though. It is different in
substance in several ways: it contains 8.3 dos-style filenames, so
probably works better on the Mac (there being 2 32-character-long
filenames in Version 3 which exceed the 31-character Mac limit).
Version 5 differs also in that it has some minor corrections to the
HTML markup, and majorly better indexing of the format ops and
sharpsign read macros. (The CLHS index is not part of the underlying
X3J13 document, so is something I could update without deviating
from the ANSI CL spec.) Version 5 also does not have the dorky little Java
widget on the Symbol Index page that never worked right for me back
when version 3 first issued (earlier versions of Netscape, and all that)
and that finally got me fed up with Java enough to remove it
in version 5. ("Write once, debug everywhere." I got tired of doing
so.) In house, some fans of that widget complained, but their complaints
fell on my deaf ears. Java might be stable enough to have put it back,
but I never got around to doing that before I, uh, "left" Harlequin.
Anyway, if you liked that Java widget as your customary interface,
version 5 might seem like a bit of a downgrade. I'd always meant to
make a v6 to fix that... Oh well.

[Free advice to Harlequin for what it's worth:
Because so many people have by now probably bookmarked individual
pages within CLHS (against my examples, btw; I have always stubbornly
resisted posting individual pointers to pages, preferring instead
to cite the main page and give English navigation instructions to
the detail page in order to preserve the possibility of changing
the internal URLs without invalidating a zillion DejaNews items),
it would not be a good plan, in my personal opinion, for Harlequin
to wholesale replace v3 with v5 on their web site without ALSO
either (1) making a shadow directory containing HTML stubs for
each of the old pages, redirecting people to each of the corresponding
new pages, or (2) perhaps easier to do: telling the Harlequin web
site server to specially redirect all references to books/Hyperspec to
books/CLHS/Front/index.htm, which is the name of the cover page in the
DOS/8.3 filenaming scheme that v5 uses. Absent such a compatibility
plan, I'd recommend staying with v4 on the web site, but maybe that's
just me.]

Incidentally, don't panic that v5 DOS/8.3 names are shorter--I
went to enormous trouble to make them also be "predictable" in case
there are people out there who like to think they know the algorithm
for page naming and type it in raw; the 8.3 filenames are also fairly
"predictable", after a fashion. That is, the algorithm, though
different, it is intended to be learnable. Coming up with an invertible
and human-readable algorithm for saying the chapter names to have 21.1
not get confused with 2.1.1 and still fit in 8 characters was fun.
A sample is: CLHS/Body/21_aaaa.htm, which is 21.1.1.1.1
The use of alphabetics accomodates some section numbers that roll
above 9 but fortunately don't get above 26.

Oh, and in answer to the big question some of you were probably
wondering if I'd get to: To my knowledge, the only way you can get
version 5, by the way, is to get a LispWorks. Though the free
Personal Edition has it, so it's not like you have to pay dollars. It
is not, to my knowledge, available as a separate item at their web
site--but then, I haven't looked recently.

And, on balance, the pressure for CLHS to be THE source of hypertext
lisp doc is less these days because Franz has an approximate
equivalent of the hyperspec that it associates with its product as
well. (I think one reason you don't hear as much about it is that
they didn't give it a jazzy name--or a name at all that I can
discern.) But it seems to have essentially the same underlying
reference text. My impression is that it might have been produced
from the last "draft" of the CL specification instead of the final
version, but if so that's only a legal matter (which I'm going to try
not to go into here because it's a rat's nest), not a technical one,
since the technicalese in the last draft and the final version was
identical.

One thing all this version stuff should tell you is that there's a
tension in the world between "the need to fix typos" and "the need to
upgrade". If typos were being fixed all the time, people would want
to download copies all the time. And that would mean there would be a
zillion subtly different versions all over the place. While at
Harlequin, when I had a say in such things, I generally resisted
making much noise about different versions because it seemed like a
lot of effort for people to download a new version for remarkably
little benefit. At some point, a new version will be needed, but I
think for now the main issue is the care and feeding of the standard,
not the care and feeding of its webification. And that's in the hands
of a committee, not some single individual. But "web versioning"
is still very much a great "unsolved problem". Coordinating updates
to something depended on world-wide is tricky; ANSI has long
made a whole business out of it.

Kent M Pitman

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

bern...@hummel.ai.univie.ac.at (Bernhard Pfahringer) writes:

> You might find 'split-sequence' useful.

Certainly a useful function to have.

> That way one could read in the complete file at once into a string
> (using READ-SEQUENCE) and do all the parsing in Lisp.

For bounded-size files. A serious virtue of the other approach is
that it doesn't require you redundantly buffer the whole file's contents
in memory. This exercise in parsing clearly requires a minimum of
state on an ongoing basis, and while the solution you propose has
that kind of APL feel of piping two powerful operators together to
get a nice result, it's not the best way to teach a newbie how to
make good engineering choices in a lot of practical settings.
Even if the file size starts small, it might grow, and then people
start to wonder what's taking up so much space. If the wrong person
looks in to fixing it, not knowing there are alternatives, it can earn
Lisp a bad name for appearing to "not having the good way to do things",
and what was a hack for pleasant convenience can turn into a reason
that someone at a certain shop thinks Lisp is never appropriate
for serious use.

Things like split-sequence should be used where there is strong
confidence that the dataset size is bounded. The mere mention of "file"
makes me nervous in that regard. Most text editors make it painful
enough to parse individual long lines that I'm pretty comfortable about
split-sequence being used to split a "line" or a "token", but not a "file".
Even though at an abstract level there is an unbroken continuum between
tokens, lines, and files, and you can think of files as "mere tokens"
conceptually, the practical fact is that there are subtle psychological
shifts we make as we move from one datastructure to another, and I think
when most people say "file", they mean "might have arbitrary length"
and when m ost people say "line" they mean "probably has bounded length,
usually less than 256." I feel pretty comfortable allocating
(make-array 256 :element-type 'character :adjustable t :fill-pointer 0)
for line buffers, for example, without worrying these will grow under
normal use, and without worrying I have to re-adjust them back down in
size periodically if they do grow. I feel a lot less sure of file buffers.

None of this really contradicts anything you said. I just worry for
newbies (since that was what the subject line said was involved) who
might be looking on and thinking this was the green light to not learn
about conventional I/O tools, which are there and should be used
sometimes.

And all just my personal opinion, of course. Other perspectives welcome.

David B. Lamkins

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

In article <sfw3e1k...@world.std.com> , Kent M Pitman
<pit...@world.std.com> wrote:

> Robert Monfera <mon...@fisec.com> writes:
>
>> Hi there,
>>
>> Kent M Pitman wrote:
>> ...
>> > Solution depends a lot on how the words are separated.
>>

>> Speaking of file formats, tab-delimited text is a very common one (where
>> spaces are part of fields), and there are a bunch of others. What is
>> the common practice here? It's very easy to quickly put together some
>> code, I am just wondering if people prefer do this, or use some public
>> interface libraries that maybe cover multiple formats such as .csv, .dbf
>> or .wk1 in the spirit of reuse.
>

> I don't personally know of a library that does this, but there may
> be one. You could poke around at the ALU's interim web site.
> http://www.elwoodcorp.com/alu/

For future reference (the original poster didn't ask about .csv formats), I
wrote a .csv reader/writer some years ago. It's on my web site at
<http://www.teleport.com/~dlamkins/ftp-catalog.html#csv-streams>.

--
David B. Lamkins <http://www.teleport.com/~dlamkins/>

There are many ways to abbreviate something, but only one way not to.

Vassil Nikolov

unread,

Apr 30, 1999, 3:00:00 AM4/30/99

to

In article <sfwg15j...@world.std.com>,

Kent M Pitman <pit...@world.std.com> wrote:

(...)
> and when most people say "line" they mean "probably has bounded length,

> usually less than 256." I feel pretty comfortable allocating
> (make-array 256 :element-type 'character :adjustable t :fill-pointer 0)
> for line buffers, for example, without worrying these will grow under
> normal use, and without worrying I have to re-adjust them back down in
> size periodically if they do grow.

I just want to make sure everybody notices the `mission-critical'
arguments there, and those are ``:ADJUSTABLE T'' without which this
code can cause much suffering. Dynamically sizing a line buffer is
a must, otherwise 256 is *not* enough---even 1024 is not enough, as
one version of vi that has this limit has shown me very eloquently.
(Especially with all that software that passes for text editors that
considers a paragraph to be a line---or should I say a line to be
a paragraph. Unfortunately I do not belong to those happy few that
can just reject such texts when they arrive by e-mail, for example.)

(...)

> And all just my personal opinion, of course. Other perspectives welcome.

I quite agree that `token' means `no problem buffering, cheap to
throw around (e.g. copy),' `line' means `can be buffered with care
and attention, may be expensive to throw around,' and `file' means
`if you expect to be able to buffer it you deserve to lose when the
time comes.'

--
Vassil Nikolov <vnik...@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Erik Naggum

unread,

Apr 30, 1999, 3:00:00 AM4/30/99

to

* Kent M Pitman <pit...@world.std.com>

I'd've used (peek-char t stream nil nil) for this. have I read the
specification too well, again? :)

#:Erik

Vassil Nikolov

unread,

Apr 30, 1999, 3:00:00 AM4/30/99

to

In article <31344736...@naggum.no>,

One difference whose importance depends on the particular problem
is that with (PEEK-CHAR T ...) one does not have control over what
exactly white space is.

Besides, since Kent Pitman's function above consumes the first
non-white-space character, maybe peek-something is not a very
appropriate name for it (unless READ-CHAR is replaced by PEEK-CHAR
or a call to UNREAD-CHAR is added).

Kent M Pitman

unread,

Apr 30, 1999, 3:00:00 AM4/30/99

to

Erik Naggum <er...@naggum.no> writes:

> * Kent M Pitman <pit...@world.std.com>
> | (defun peek-char-after-whitespace (stream)
> | (loop for ch = (read-char stream nil nil)
> | while ch
> | when (not (whitespace? stream))
> | do (return ch)))
>
> I'd've used (peek-char t stream nil nil) for this. have I read the
> specification too well, again? :)

I thought about this, and perhaps should have mentioned it. I'll
leave it to you to decide. Basically, as a style thing, I only use
(peek-char t stream nil nil) when I'm skipping "Lisp program text",
not "user data". The reasons are these (some being "better" reasons
than others):

* Using it means you have to be happy with the whitespace[2]
definition of whitespace, even if you're content to restrict
yourself purely to whitespace.

* Using it means you don't think about the fact that you might want
to use other separators than whitespace (like ",") which might
be useful in in application situations.

* Using it didn't give me a chance to illustrate character-level I/O.

* A long time ago, in Maclisp, TYIPEEK (which had an equivalent argument)
had the following behavior which CL does not have, but I have lingering
fear of (quoting from the Revised Maclisp Manual, a.k.a. Pitmanual):

If bits [the equivalent of the CL peek-type argument]
is just T, TYIPEEK will skip over characters of input until
the beginning of an S-expression is reached. Splicing macro
characters, such as ``;'' comments, are not considered to
begin an object. If one is encountered, its associated
function is called as usual (so that the text of the comment
can be gobbled up or whatever) and TYIPEEK continues scanning
characters.

Unrelated trivia (no longer quoting): The way ``splicing macros''
worked is that they were readmacros whose results were "appended"
to the input stream (sort of like ,@ in backquote). ; was a splicing
macro that returned the empty list. My recolleciton is that it was possible
in principle possible for splicing macros to return *several*
things instead of zero things, but the T argument to TYIPEEK made
a mess of things when this happened for reasons you can probably imagine.
We fixed this misfeature in CL, explicitly requiring that splicing
readmacros return only one or zero values.

* Even if you are happy with the way PEEK-CHAR does things, your code is
still at risk that the language standard will change and the "space" (so
to speak) of things (PEEK-CHAR T ...) skips will change. This is true
of any function, of course, but I regard it as more true of functions
that add arbitrary and marginal functionality such as PEEK-CHAR does here.
Frankly, I'd be pleased as could be (other than the compatibility
nightmare it would cause to make it happen, so I would never vote for
it) if this peek-type argument just disappeared and PEEK-CHAR had
the same argument signature as READ-CHAR. I get hit by this all the
time in new code.

Kent M Pitman

unread,

Apr 30, 1999, 3:00:00 AM4/30/99

to

Vassil Nikolov <vnik...@poboxes.com> writes:

> Besides, since Kent Pitman's function above consumes the first
> non-white-space character, maybe peek-something is not a very
> appropriate name

Absolutely right. Now that I think about it, I guess I usually call this
one READ-CHAR-AFTER-WHITESPACE.

Erik Naggum

unread,

May 1, 1999, 3:00:00 AM5/1/99

to

* Vassil Nikolov <vnik...@poboxes.com>

| One difference whose importance depends on the particular problem is that
| with (PEEK-CHAR T ...) one does not have control over what exactly white
| space is.

really? I control the meaning of whitespace by modifying the readtable
in an application. e.g., I have made all control characters into
whitespace _except_ newline, which is an important delimiter in my data
stream. in what way does this not work? </tongue-in-cheek>

#:Erik

Kent M Pitman

unread,

May 2, 1999, 3:00:00 AM5/2/99

to

Erik Naggum <er...@naggum.no> writes:

The above text is either mine or vey like something I said. It means
you can't change the meaning of whitespace independent of what READ
wants. [Except by *readtable*, of course. A big hammer. Code starts
to look not much simpler than writing your own parser, which IMO is
more perspicuous. Don't get me wrong--I've done some pretty crazy
things with readtables in my time, including my own versions of the
hack I think it was Vaughan Pratt (author of CGOL, the Maclisp-based
infix syntax for lisp) used originally where you make every character
have the same readtable entry and make that readtable entry launch a
custom parser, so you can call (READ) to read fortran programs and the
like. READ can be perverted to do some interesting things. But, I
guess as a function of my advancing age, and the inevitable
spoilsport-like attitude that eventually takes over us oldsters making
us no fun to talk to any more, I've come a bit more to the conclusion
that some things that are POSSIBLE are nevertheless still not the best
way to do things.]

Erik Naggum

unread,

May 2, 1999, 3:00:00 AM5/2/99

to

* Kent M Pitman <pit...@world.std.com>

| The above text is either mine or vey like something I said. It means you
| can't change the meaning of whitespace independent of what READ wants.

that's ok, because I use this _with_ READ, to read ordinary Common Lisp
forms, except that a bunch of features have been disabled, whitespacitude
has been relaxed, and the newline is a terminating macro character.

building my own reader may have been as much work, but starting to build
my own reader would have been a lot more work, and it would have been a
lot of duplicative effort, anyway. as I have gained experience from
usage, I have come to exclude various stuff from that readtable, but I
would still do it the same way all over again, because I really don't
have the time to write the low-level stuff in a reader. the C complement
to my protocol is mostly reader-related, and all it does is attach a type
character to the front of a string, and all objects are represented as
strings, re-parsed upon demand. call it a cop-out, but there's a lot of
hairy stuff that READ does that is too detailed and low-level to work out
anew, without effectively designing your own syntax, and _that's_ just
plain evil.

| But, I guess as a function of my advancing age, and the inevitable
| spoilsport-like attitude that eventually takes over us oldsters making us
| no fun to talk to any more, I've come a bit more to the conclusion that
| some things that are POSSIBLE are nevertheless still not the best way to
| do things.

oh, stop it! we youngsters like you just the way you are, Kent.

#:Erik

Vassil Nikolov

unread,

May 3, 1999, 3:00:00 AM5/3/99

to

In article <31345715...@naggum.no>,
Erik Naggum <er...@naggum.no> wrote:
(...)

> really? I control the meaning of whitespace by modifying the readtable
> in an application. e.g., I have made all control characters into
> whitespace _except_ newline, which is an important delimiter in my data
> stream. in what way does this not work? </tongue-in-cheek>

(I admit that I didn't recognise the above as Kent Pitman's text,
but now the mystery of that SGML tag has been resolved.)

I just want to add a few things to what has already been posted.

Perhaps it would be nice if the fact that the meaning of
whitespace[2] depends on the current readtable and not on the
standard readtable was made a little more explicit in the spec
(indeed, *READTABLE* is listed in the `Affected By' section of
PEEK-CHAR's description, but perhaps something could be mentioned
either in the glossary or in Section 2.1.4.7 (Whitespace Characters)
too).

Also, note the idiom for setting the whitespace attribute of a
readtable entry: (SET-SYNTAX-FROM-CHAR c #\Space). (It's an
idiom to me because SET-SYNTAX-FROM-CHAR copies everything, not
just the whitespace attribute.)

Also, perhaps a standard function (WHITESPACEP &OPTIONAL readtable)
that tests this attribute could be useful (I know a kludge that
does this, but it is rather inelegant).

Erik Naggum

unread,

May 3, 1999, 3:00:00 AM5/3/99

to

* Erik Naggum <er...@naggum.no>

| really? I control the meaning of whitespace by modifying the readtable
| in an application. e.g., I have made all control characters into
| whitespace _except_ newline, which is an important delimiter in my data
| stream. in what way does this not work? </tongue-in-cheek>

* Vassil Nikolov <vnik...@poboxes.com>

| (I admit that I didn't recognise the above as Kent Pitman's text, but now
| the mystery of that SGML tag has been resolved.)

huh? there appears to be some attribution problems here. I hope I'm not
morphing into Kent Pitman, or vice versa -- I like us separate.

| Perhaps it would be nice if the fact that the meaning of whitespace[2]
| depends on the current readtable and not on the standard readtable was
| made a little more explicit in the spec (indeed, *READTABLE* is listed in
| the `Affected By' section of PEEK-CHAR's description, but perhaps
| something could be mentioned either in the glossary or in Section 2.1.4.7
| (Whitespace Characters) too).

I find this to be sufficiently basic to the language design that it would
actually be confusing to add it in particular places -- one would have to
wonder why it was added.

| Also, perhaps a standard function (WHITESPACEP &OPTIONAL readtable) that
| tests this attribute could be useful (I know a kludge that does this, but
| it is rather inelegant).

your implementation may sport just such functions, or they might be
macros or accessors inlined so strongly that they are not retained in the
dumped image.

#:Erik

Juanma Barranquero

unread,

May 4, 1999, 3:00:00 AM5/4/99

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 29 Apr 1999 14:47:17 GMT, Kent M Pitman <pit...@world.std.com>
wrote:

>barra...@laley-actualidad.es (Juanma Barranquero) writes:

>That is, the typos are largely in the underlying ANSI CL spec, not
>in the hypertext layer of the document. (I was and am redirecting
>typos reported about CLHS as implicit requests for J13 to do
>something, but that's a separate matter.) It was/is important to
>the integrity of the document that the hypertext be precisely what
>is in the ANSI CL hardcopy. Once you fix typos, a divergence
>arises, and some such divergences could create material disputes
>over meaning. I and others wanted to avoid that where possible.

While I understand that, and I supposed as much, I still feel it would
be nice to correct obvious typos, perhaps adding a note with the full
original text (and a *big* disclaimer stating that, should a problem
with interpretation arise, the reader would be better advised to go
read the ANSI CL spec, of course :)

But in fact I didn't meant typos, but things like the alignment errors
in the Permuted Symbol Index (where things like "pathname-name" are
aligned around the first appearance of the index letter, not the first
of the corresponding word), specially notorious in the "P".

>And historical documents are historical documents; one doesn't update
>spellings in the Declaration of Independence (or whatever
>your country's equivalent of that might be :-).

The "Constitución de Cádiz de 1812" would be a fine example. OTOH,
people (scholars, I mean) *do* correct typos in the Quixote,
Shakespeare texts, etc. :) But I'm just joking, I understand the
rationale pretty well.

>I'm not necessarily likely to mount such an effort, especially
>absent funding to do so, but that would be what I would be
>inclined to do if I did get the urge, I guess is what I'm saying.]

Well, if you feel that urge sometimes (and sitting down for a while
doesn't make it to pass :) it wouldn't be difficult to find a few of
us happy to help, I'd say.

>At any rate, the virtue of CL qua language is its stability, so the
>fact that documents about it don't change regularly is not an
>automatic thing to panic about.

Yes, of course.

[Going backwards in time...]

>This is a popular question. The answer is a good deal more
>complicated than you probably expected. Here goes...

Thanks a lot for taking the time to answer so thoroughly.

/L/e/k/t/u

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.0.2i

iQA/AwUBNy7xC/4C0a0jUw5YEQIKvQCbB4bbUA1VdtFaogQ41/JkjXtmFPQAnRRM
u9MgJUc2yF/V/NWCBKPucMeK
=mWdT
-----END PGP SIGNATURE-----

Nick Levine

unread,

May 5, 1999, 3:00:00 AM5/5/99

to

> But in fact I didn't meant typos, but things like the alignment errors
> in the Permuted Symbol Index (where things like "pathname-name" are
> aligned around the first appearance of the index letter, not the first
> of the corresponding word), specially notorious in the "P".
>

Thanks for the tip-off. I imagine this could be easily fixed in a future
revision.

While you are welcome to post such issues here, mail to
clhs...@harlequin.com would be more direct.

- nick

Vassil Nikolov

unread,

May 6, 1999, 3:00:00 AM5/6/99

to

In article <31347569...@naggum.no>,

Erik Naggum <er...@naggum.no> wrote:
> * Erik Naggum <er...@naggum.no>
> | really? I control the meaning of whitespace by modifying the readtable
> | in an application. e.g., I have made all control characters into
> | whitespace _except_ newline, which is an important delimiter in my data
> | stream. in what way does this not work? </tongue-in-cheek>
>
> * Vassil Nikolov <vnik...@poboxes.com>
> | (I admit that I didn't recognise the above as Kent Pitman's text, but now
> | the mystery of that SGML tag has been resolved.)
>
> huh? there appears to be some attribution problems here. I hope I'm not
> morphing into Kent Pitman, or vice versa -- I like us separate.

:-) (Let me assure you that I perceive both of you as very distinct from
one another.)

What I had in mind was the last line of the excerpt that follows:

From: Kent M Pitman <pit...@world.std.com>
Subject: Re: Newbie Help Please: Reading into a list from a file
Date: 1999/05/02
Message-ID: <sfw7lqs...@world.std.com>
Newsgroups: comp.lang.lisp
References: <37266b90...@news3.newscene.com>
<31344736...@naggum.no>
<7gcnek$294$1...@nnrp1.dejanews.com>
<31345715...@naggum.no>

Erik Naggum <er...@naggum.no> writes:
(...)

> really? I control the meaning of whitespace by modifying the
readtable
> in an application. e.g., I have made all control characters into
> whitespace _except_ newline, which is an important delimiter in my
data
> stream. in what way does this not work? </tongue-in-cheek>

The above text is either mine or vey like something I said.

(end of excerpt).

>
> | Perhaps it would be nice if the fact that the meaning of whitespace[2]
> | depends on the current readtable and not on the standard readtable was
> | made a little more explicit in the spec

(...)

>
> I find this to be sufficiently basic to the language design that it would
> actually be confusing to add it in particular places -- one would have to
> wonder why it was added.

Perhaps you are right, and this is a matter for a commentary on the
standard.

> | Also, perhaps a standard function (WHITESPACEP &OPTIONAL readtable) that
> | tests this attribute could be useful (I know a kludge that does this, but
> | it is rather inelegant).
>
> your implementation may sport just such functions, or they might be
> macros or accessors inlined so strongly that they are not retained in the
> dumped image.

If implementations provide such a function, why not include it in the
standard?

(If they don't provide an appropriate readtable accessor, then the
user can do---as far as my ingenuity goes---only something along
the following line: convert the character into a string and call
READ-FROM-STRING. If the eof-value is returned, then the character
is whitespace in the current readtable, otherwise an object would be
returned or an error would be signalled. Pretty kludgy to me.)