struggling with READ-CHAR and LOOP (accumulating digits)

Xenophon Fenderson the Carbon(d)ated

unread,

Oct 29, 2000, 10:29:16 PM10/29/00

to

The following bit of code is supposed to read decimal digits until it
encounters a non-decimal digit, converting the digits read into a
positive integer and returning that number, or returning NIL if no
digits were read:

(let (p)
(loop
(let* ((c (read-char)) ;read next char
(n (digit-char-p c))) ;convert to digit in base 10
(if (null n) ;if C is not a digit,
(return (unread-char c)) ;push C back onto the stream and exit the loop
(if (null p)
(setq p n) ;initialize P to N
(setq p (+ (* p 10) n)))))) ;shift P left and add N (e.g. P=1, N=2 :: new P=12)
p)

I'm not sure if it works. Corman Lisp returns NIL upon entry of the
above form, instead of waiting for some input. Clisp complains about
an unbound variable L and signals an error.

What am I doing wrong?

--
"Remember - if all you have is an axe, every problem looks like hours of fun."
-- Frossie in the monastery

Kent M Pitman

unread,

Oct 30, 2000, 2:34:55 AM10/30/00

to

xeno...@irtnog.org (Xenophon Fenderson the Carbon(d)ated) writes:

> The following bit of code is supposed to read decimal digits until it
> encounters a non-decimal digit, converting the digits read into a
> positive integer and returning that number, or returning NIL if no

> digits were read: [...]

>
> I'm not sure if it works. Corman Lisp returns NIL upon entry of the
> above form, instead of waiting for some input. Clisp complains about
> an unbound variable L and signals an error.
>
> What am I doing wrong?

All just guesses since I don't have the implementations and didn't run
your code, but...

Probably you are being confused by testing it interactively on a stream
where you're not sure how the crlf handling works. Maybe you have to
put the input on the same line as the last close paren for it to work in
those systems; or better, define a function FOO and do
(foo)1111
In other implementations, you might do
(foo)
1111

It is not specified in the standard what the nature of activation in an
interactive listener is that's both reading code and behaving as a user
interactor. If you call it more than once, you should get the same
behavior though. e.g.,

(defun whitespace-p (char)
(or (eql char #\Space) (not (graphic-char-p char))))

(defun gobble-whitespace (&optional (stream *standard-input*))
(loop for c = (read-char stream nil nil)
until (or (not c) ;eof
(not (whitespace-p c)))))

and then either

(progn (gobble-whitespace)
(loop (let ((num (foo)))
(print num))))

or maybe

(loop (gobble-whitespace)
(let ((num (foo)))
(print num))))

The second might be needed not so much due to implementation variance
as convenience to an interactive typist, who might accidentally type an
extra space and want to be forgiven.

Btw, I wouldn't do (return (unread-char ...)). That's bad style. Makes
it look like the character is being returned. Change

(let (p)
(loop ...
)
p)

to

(let ((p 0))
(loop
... (return p) ...))

In this way, you can also get rid of the ugly test for (null p)
(which, by the way, some people would say should be (not p), since
p is not holding a list but a boolean. But in any case, if you have
the right degenerate value for p, which is 0, then the (setq p (+ ...))
will work even for the base case.

Making this change will also make it more clear to someone just skimming
the code what the return value is, rather than making a RETURN with a
value that is not being focused on a return value. Punning the RETURN
as you're doing just to avoid a PROGN (or a COND) is bad form.

Erik Naggum

unread,

Oct 30, 2000, 4:04:20 AM10/30/00

to

* Kent M Pitman <pit...@world.std.com>

I use (peek-char t <stream>) for this. Several older Lispers have
been surprised that this functionality is in there, but it has been
since CLtL1.

#:Erik
--
Does anyone remember where I parked Air Force One?
-- George W. Bush

Kent M Pitman

unread,

Oct 30, 2000, 11:29:30 AM10/30/00

to

Erik Naggum <er...@naggum.net> writes:

> * Kent M Pitman <pit...@world.std.com>
> | (defun whitespace-p (char)
> | (or (eql char #\Space) (not (graphic-char-p char))))
> |
> | (defun gobble-whitespace (&optional (stream *standard-input*))
> | (loop for c = (read-char stream nil nil)
> | until (or (not c) ;eof
> | (not (whitespace-p c)))))
>
> I use (peek-char t <stream>) for this. Several older Lispers have
> been surprised that this functionality is in there, but it has been
> since CLtL1.

As a style thing, I don't. I regard PEEK-CHAR to be definitionally at the
mercy of the langauge designers, and I never use it for anything that is
not synchronized to likewise be at the mercy of the language designers.

This is a rare "peek" into a particularly strange form of paranoia that
strikes me and a few other people who taught me to program, but here's
the extraordinarily unlikely scenario that illustrates a difference in our
positions:

Suppose the language designers observed that the character "|" was underused
and decided to make it be whitespace so that people could use it as a visible
but textually-ignored form of whitespace in programs. They would analyze
the whole "language" to see if other things needed to be changed consistently.
graphic-char-p would not be changed because its definition is not dependent
on Lisp-meaning, but rather on fontology, so my program wouldn't be broken.
But your program would be broken because you have violated the abstraction
and used peek-char in a case it was never intended for. Right now, what you
say is "true", but in the terminology of Saul Kripke, it would not be
"necessarily true". (The difference being that "there are 9 planets" might
be true, even though "there are 8 planets" would be a fine truth as well, but
"2 plus 2 is 4" is "necessarily true" in that having it be false would have
strong ramifications on the fabric of the universe.)

Understand: The language designers aren't likely to change anything at all
at this point, but they could.

My feeling is analogous to what you see in people who recommend not
parsing comma-lists by doing
(read-from-string (format nil "(~A)" (substitute #\, #\Space comma-list)))
It might be that one knows that their data is a list of numbers such that
this will mostly work, but the point is that if the numbers aren't Lisp
numbers and are someone else's numbers coincidentally defined to have
similar (but perhaps not identical) syntax, this will fail not because the
general (123 456) notation won't work but because, perhaps, there is a
lurking "infinity" or "3+4i" or "(3.0,4.5)" in there [which would have been
badly rewritten to "(3.0 4.5)" in the substitute, btw].

It's all about the very nature of abstraction. Abstraction is about
separating the true from the necessarily true. The C language is
about blurring those differences, and relying on accidental
implementational alignments in order to gain a few percent in speed or
space. Lisp is about avoiding that. Depending on your feelings about
Gabriel's "worse is better", one of those approaches is "right"...

But for whatever reason, that's why I don't use the T argument to PEEK-CHAR
unless I'm implementing Lisp reader functionality. Every time I use any
character-related functionality in CL, my very first thought is "am I talking
about the tools for manipulating reader functionality or the tools for
manipulating fonts/letters/etc. whose properties are maintained separately".
I try to keep these universes very separate, since different political
regimes control them and I expect those regimes never to coordinate. As long
as I work in only one realm or the other, I can weakly hope for those regimes
to behave in internally consistent ways.

Most people would call me silly or anal for worrying that CL is ever
going to change on this point, but there ya go... I beat them to it.

Erik Naggum

unread,

Oct 30, 2000, 3:50:25 PM10/30/00

to

* Kent M Pitman <pit...@world.std.com>

| As a style thing, I don't. I regard PEEK-CHAR to be definitionally at the
| mercy of the langauge designers, and I never use it for anything that is
| not synchronized to likewise be at the mercy of the language designers.

Well, I consider the definition of "whitespace" to be a language
issue, so I agree with your reasoning, but the conclusion is the
opposite because you apparently believe that what's whitespace is an
environmental issue, in which case it is definitely _wrong_ to write
your own code that decides what is what. In other words, it is
unlikely that graphic-char-p will be any less at the mercy of the
language designers than what's considered whitespace.

In particular, I can modify what is considered whitespace through
the reader tables, but I cannot easily modify what is considered a
graphic character through any tables. (I think this is really,
really bad, however. graphic-char-p should have a setf method.)

I would counter that the disparity between the environment in which
the data was written and the one in which it is read are more likely
to differ in what is considered whitespace if they are each left to
decide what it is rather than to defer the decision to an authority
like the language.

| graphic-char-p would not be changed because its definition is not
| dependent on Lisp-meaning, but rather on fontology, so my program
| wouldn't be broken.

Yes, it would, if the "fontology" changed without your knowing
between the writing and the reading of that self-same stream of
characters. The only way to resolve this issue is to let the data
itself contain a description of what it considers whitespace, in a
form that is defined by a language authority. I took a short-cut on
that route and decided to defer to the language definition directly.

| But your program would be broken because you have violated the
| abstraction and used peek-char in a case it was never intended for.
| Right now, what you say is "true", but in the terminology of Saul
| Kripke, it would not be "necessarily true".

But you're even worse off with graphic-char-p.

| Understand: The language designers aren't likely to change anything
| at all at this point, but they could.

Well, what can I say? Character sets and encoding is one of my
specialties, and I distrust the programming population's ability to
think clearly about the meaning of "character" as opposed to "byte",
with good reason, I might add: Few fundamental areas of computer
science have been screwed worse than the most basic: The meaning of
our information. Implicit representation of the century was a minor
thing compared to the implic representation of the character sets.

| My feeling is analogous to what you see in people who recommend not
| parsing comma-lists by doing
| (read-from-string (format nil "(~A)" (substitute #\, #\Space comma-list)))

But that's clearly stupid! :) _Perl_ people do that kind of thing.

| It's all about the very nature of abstraction.

I fully agree. I want an abstraction of "whitespace" that is
consistent with the language I use. If the input does not agree,
then we sit down and define the input format, but this thing about
the whitespace arose because the _Lisp_ environment decided what to
consume and what not to consume from the input stream. Clearly that
is _not_ something that comes from the input language, anymore.

| Abstraction is about separating the true from the necessarily true.

It is necessarily true that if the Lisp environment decided not to
consume what it considers whitespace, then there is whitespace left
whose definition is at the discretion of the Lisp environment, and
you would be in error to assume that you knew better, assuming that
the Lisp environment had not consumed something that was non-space,
non-graphic-char-p.

| The C language is about blurring those differences, and relying on
| accidental implementational alignments in order to gain a few
| percent in speed or space. Lisp is about avoiding that.

Yes! That's why I rely on the Lisp environment to know which
characters _it_ did not consume.

| But for whatever reason, that's why I don't use the T argument to
| PEEK-CHAR unless I'm implementing Lisp reader functionality.

I consider this a case of Lisp reader functionality, because we're
reading input with the Lisp reader. That's how the whole need for
this initial whitespace-gobbling came up, remember? We're not
talking about an input stream under total programmer control, here,
but one which might contain some "whitespace" after the expression
that caused the function be called on the same input stream as the
Lisp reader was gobbling or not from.

| Every time I use any character-related functionality in CL, my very
| first thought is "am I talking about the tools for manipulating
| reader functionality or the tools for manipulating
| fonts/letters/etc. whose properties are maintained separately".

Yep, me too. In this case, the Lisp reader decided not to gobble
the whitespace that was left in the input buffer between the end of
the expression and the user input.

| I try to keep these universes very separate, since different
| political regimes control them and I expect those regimes never to
| coordinate.

Agreed. That's why it would be wrong to second-guess the whitespace
gobbler in the Lisp reader to consume your particular understanding
of what constitutes whitespace in the input _after_ the Lisp reader
has been satisfied.

| As long as I work in only one realm or the other, I can weakly hope
| for those regimes to behave in internally consistent ways.

But in this case, you work in _both_ realms.

| Most people would call me silly or anal for worrying that CL is ever
| going to change on this point, but there ya go... I beat them to it.

:)

Kent M Pitman

unread,

Oct 30, 2000, 5:02:52 PM10/30/00

to

Erik Naggum <er...@naggum.net> writes:

> ... I agree with your reasoning, but the conclusion is the
> opposite ...

I often say about "wisdom" that the cool thing about it is that it isn't
about the answer but about the reasoning process, and that two wise people
with the same basic knowledge and reach opposite "wise" answers. This may
be a trivial situation to really get "wisdom" involved, but I think it
illustrates the point to some extent.

> In particular, I can modify what is considered whitespace through
> the reader tables, but I cannot easily modify what is considered a
> graphic character through any tables. (I think this is really,
> really bad, however. graphic-char-p should have a setf method.)

Interesting point. If you started from a clean readtable and prefilled it
with the characters you thought you knew were set each way, I might believe
it more. An analogous issue comes up with "optional arguments" where I have
some queasiness about (open "foo") rather than (open "foo" :direction :input).
Part of me thinks that if you know you want direction input, you sould say it.
But I comfort myself with the "knowledge" (and I use the term lightly) that
the language designers will take this into account before ever changing
things, so probably it will never change. Still, you can see people getting
pretty miffed if it did, and "the fact that you COULD change it isn't a
substitute for having preset it."

So I'd consider you unambiguously to have the moral high ground if you
prefilled your readtable to known settings and I consider us to be equally
hanging by a thread otherwise. But that's just me. I respect your position
on this as a legitimate difference in reasonable positions. And I admit I
find your position on this both surprising and interesting. I'll give it
more thought, though I don't promise to change my mind...

The rest of your post was interesting to read as well, but no further
comment is required since it all follows as "derived conclusions" from
this basic difference in point of view.

Rob Warnock

unread,

Oct 31, 2000, 6:17:18 AM10/31/00

to

Erik Naggum <er...@naggum.net> wrote:
+---------------

| Well, what can I say? Character sets and encoding is one of my
| specialties, and I distrust the programming population's ability to
| think clearly about the meaning of "character" as opposed to "byte",
| with good reason, I might add: Few fundamental areas of computer
| science have been screwed worse than the most basic: The meaning of
| our information.

+---------------

The you'll probably want to stay away from "sci.crypt" for a while
if you value your blood pressure. There've been a bunch of idiots
over there saying things like, "The term `octet' is silly -- a byte
is *8* bits, period!", and completely denying any relevance of the
long tradition of machines with other than 8-bit bytes. [As an quondam
PDP-8 & PDP-10 hacker myself, I found this *particularly* galling...]

-Rob

-----
Rob Warnock, 31-2-510 rp...@sgi.com
Network Engineering http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
1600 Amphitheatre Pkwy. PP-ASEL-IA
Mountain View, CA 94043