Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why character syntax-copying can't work

7 views
Skip to first unread message

Tim Bradshaw

unread,
Sep 3, 2002, 10:57:23 AM9/3/02
to
[I'm deliberately not following up to the thread this `should' be in
in the hope that I can sneak under his radar and communicate with
people with brains.]

Although there have been a number of correct arguments as to why
merely copying the syntax of parens does not work (namely: the
standard says that it does not), I haven't seen a really good
description of why this kind of trick *can't* work, and also why
making it work is not a desirable extension to the language[1]. The
reason is actually quite simple.

In order to correctly read a `bracketed form'[2] given only the
opening character, you must know *which* character matches the
opening character. The Common Lisp readtable does not contain
this information, and therefore it is not possible to do this with
character-syntax copying.

To see why this must be true, imagine I want to cause the system to
read forms opening with a character a, and closing with another
character b. I might try to do this as follows:

(set-syntax-from-char a #\()
(set-syntax-from-char b #\))

Now consider what happens when the reader sees a in an unquoted
context. The reader function is called with two arguments: the stream
being read, and the character a. It needs to somehow read a form
delimited by b. But how can it know that it should read b and not,
say #\)? It can't: there is no table of which pairs match anywhere in
the system. Therefore *this trick cannot work*: there is simply no
mechanism in the languages which will allow correct matching to be
enforced.

To allow this trick to work as an extension to the standard it would
be enough to cause the paren reader not to look for #\), but instead
simply to look for a character with the same syntax. This is very
undesirable, because there is *still* no mechanism of ensuring
matching - *any* opening delimiter will match *any* closing delimiter.
This seems to be what LW does currently, but it's fairly clear that
it's not something that you would actually want in general.

There are two ways out of this bind. One is to change things such
that there is a table of matching characters, to which the user could
add pairs. This would be consulted by the open-paren reader to
enforce matching. This is kind of akin to what emacs does in its
syntax tables. The other is to provide a function which will read a
form delimited by a given char, and to install a suitable call to this
as the reader function for the opening char. This is not quite
READ-DELIMITED-LIST because in general this function needs to deal
with consing-dot. The second approach seems better to me, especially
as in the great majority of cases READ-DELIMITED-LIST is already
adequate.

--tim

Footnotes:
[1] although I think at least one person has been hinting at this.

[2] I made this term up, I don't know if there is a term in the spec
which corresponds to it.

ilias

unread,
Sep 3, 2002, 12:31:21 PM9/3/02
to
Tim Bradshaw wrote:
> [I'm deliberately not following up to the thread this `should' be in
> in the hope that I can sneak under his radar and communicate with
> people with brains.]

you have written a nice story, which everyone sees to which topic it
relates.

I interconnect an argumentation-line with references to CLHS.

If you like, please reply directly to the topic, so the the readers can
follow and relate your comments easier with my statements. I would
appriciate this and (i think) some people, too.

Tim Bradshaw

unread,
Sep 3, 2002, 1:01:42 PM9/3/02
to
* I wrote:

> Although there have been a number of correct arguments as to why
> merely copying the syntax of parens does not work (namely: the
> standard says that it does not), I haven't seen a really good
> description of why this kind of trick *can't* work, and also why
> making it work is not a desirable extension to the language[1]. The
> reason is actually quite simple.

Just in case this ever comes up again, I've put most of it up at
http://www.tfeb.org/lisp/obscurities#RDL. It explains why it can't
work, and why READ-DELIMITED-LIST is not the full answer, either.

--tim

Tim Bradshaw

unread,
Sep 3, 2002, 1:04:47 PM9/3/02
to
* I wrote:

> Although there have been a number of correct arguments as to why
> merely copying the syntax of parens does not work (namely: the
> standard says that it does not), I haven't seen a really good
> description of why this kind of trick *can't* work, and also why
> making it work is not a desirable extension to the language[1]. The
> reason is actually quite simple.

Just in case this ever comes up again, I've put most of it up at
http://www.tfeb.org/lisp/obscurities.html#RDL. It explains why it can't

Erann Gat

unread,
Sep 3, 2002, 1:07:48 PM9/3/02
to
In article <ey3heh7...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

> [I'm deliberately not following up to the thread this `should' be in
> in the hope that I can sneak under his radar and communicate with
> people with brains.]

Sorry to disappoint you.

> Although there have been a number of correct arguments as to why
> merely copying the syntax of parens does not work (namely: the
> standard says that it does not), I haven't seen a really good
> description of why this kind of trick *can't* work, and also why
> making it work is not a desirable extension to the language[1]. The
> reason is actually quite simple.
>
> In order to correctly read a `bracketed form'[2] given only the
> opening character, you must know *which* character matches the
> opening character. The Common Lisp readtable does not contain
> this information, and therefore it is not possible to do this with
> character-syntax copying.

I think this argument is flawed. Just because the readtable doesn't
contain this information doesn't mean that a reader function can't access
this information stored someplace else. In particular, the fact that the
characer being read is passed as an argument to the reader macro function
makes this quite straightforward:

(set-macro-character ch (lambda (stream char)
(read-delimited-list stream (case char (#'( #')) (#'[ #']) (#'{ #'}))))

I haven't actually tried this, but I'm pretty sure something like this
will work.

E.

Erann Gat

unread,
Sep 3, 2002, 2:23:22 PM9/3/02
to

Sorry, Tim, but you're wrong. Not only can it be done, but it turns out
to be very simple:

(defun bracket-reader (stream char)
(read-delimited-list (case char (#\( #\)) (#\[ #\]) (#\{ #\})) stream))

(set-macro-character #\( #'bracket-reader)
(set-syntax-from-char #\[ #\( *readtable* *readtable*)
(set-syntax-from-char #\] #\) *readtable* *readtable*)
(set-syntax-from-char #\{ #\( *readtable* *readtable*)
(set-syntax-from-char #\} #\( *readtable* *readtable*)


; MCL 4.3.1

? '(1 2 3)
(1 2 3)
? '{1 2 3}
(1 2 3)
? '[1 2 3]
(1 2 3)

; Mismatched brackets do the Right Thing too:

? '[1 2 3)
; Warning: Ignoring extra ")" on #<LISTENER "Listener" #x15B0824E> .
; While executing: #<Anonymous Function #x1582721E>

? '(1 2 3]
; Warning: Ignoring extra "]" on #<LISTENER "Listener" #x15B0EE16> .
; While executing: #<Anonymous Function #x1582721E>


E.

Tim Bradshaw

unread,
Sep 3, 2002, 3:44:58 PM9/3/02
to
* Erann Gat wrote:

> Sorry to disappoint you.

I didn't mean you.

> I think this argument is flawed. Just because the readtable doesn't
> contain this information doesn't mean that a reader function can't
> access this information stored someplace else. In particular, the
> fact that the characer being read is passed as an argument to the
> reader macro function makes this quite straightforward:

Yes, this can work of course. What I meant was that there is no
God-given way of knowing the matching character, and therefore the
system-provided function for #\( must either have hard-wired in that
it needs to look for #\), or it must look for a character whose syntax
is the same as #\). The first option (which I think is the better of
the two) will cause copying syntax to just not work, while the second
will cause it to be too permissive as to what matches.

Given a READ-DELIMITED-FORM, one could obviously implement a general
reader as:

(defvar *delimiter-alist*
'((#\( . #\))
...))

(defun read-form (stream char)
(let ((match (cdr (assoc char *delimiter-alist*))))
(unless match (error ...)))
(read-delimited-form match stream t))

and then READ-FORM is suitable for installing on opening delimiters.

(of course an implementation would be free to provide a table-driven
form reader like this as an extension, so by `system-provided' above I
really mean `as defined by the standard' or something).

--tim

Tim Bradshaw

unread,
Sep 3, 2002, 3:50:50 PM9/3/02
to
* Erann Gat wrote:
> Sorry, Tim, but you're wrong. Not only can it be done, but it turns
> out to be very simple:

PLEASE READ WHAT I WROTE.

Which was:

Although there have been a number of correct arguments as to why
merely copying the syntax of parens does not work (namely: the
standard says that it does not), I haven't seen a really good
description of why this kind of trick *can't* work, and also why

making it work is not a desirable extension to the language.

your solution (a) does not `merely copy the syntax of parens' (which
is the main point) and (b) does not, in fact, implement the full
semantics of the paren reader anyway, because it cannot cope with
consing dot (this is acceptable in many cases, of course).

--tim

Fred Gilham

unread,
Sep 3, 2002, 3:58:02 PM9/3/02
to

g...@jpl.nasa.gov (Erann Gat) wrote:
> Sorry, Tim, but you're wrong. Not only can it be done, but it turns out
> to be very simple:

I thought Tim said to use read-delimited-list, and that even that
wasn't a complete answer.

Your solution suffers from the `dot' problem:

T
* '(1 2 3)

(1 2 3)
* '{1 2 3}

(1 2 3)
* '[1 2 3]

(1 2 3)


So far so good.


* '{1 . 2}


Reader error at 19764 on #<Two-Way Stream, Input = #<Synonym Stream to SYSTEM:*STDIN*>, Output = #<Synonym Stream to SYSTEM:*STDOUT*>>:
Dot context error.

Restarts:
0: [ABORT] Return to Top-Level.

Debug (type H for help)

(COMMON-LISP::%READER-ERROR
#<Two-Way Stream, Input = #<Synonym Stream to SYSTEM:*STDIN*>, Output = #<Synonym Stream to SYSTEM:*STDOUT*>>
"Dot context error.")
0]


--
Fred Gilham gil...@csl.sri.com
"Come to me, all who labor and are heavy laden, and I will give you
rest. Take my yoke upon you, and learn from me, for I am gentle and
lowly in heart, and you will find rest for your souls. For my yoke
is easy, and my burden is light." --Jesus of Nazareth

Erann Gat

unread,
Sep 3, 2002, 5:12:19 PM9/3/02
to
In article <u7d6rux...@snapdragon.csl.sri.com>, Fred Gilham
<gil...@snapdragon.csl.sri.com> wrote:

> Your solution suffers from the `dot' problem:

Yes. A solution for that is left as an excercise for the reader (pun
intended). (IMO dotted lists are a Bad Idea anyway.)

E.

Erann Gat

unread,
Sep 3, 2002, 5:16:51 PM9/3/02
to
In article <ey3vg5m...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

> * Erann Gat wrote:
> > Sorry, Tim, but you're wrong. Not only can it be done, but it turns
> > out to be very simple:
>
> PLEASE READ WHAT I WROTE.
>
> Which was:
>
> Although there have been a number of correct arguments as to why
> merely copying the syntax of parens does not work (namely: the
> standard says that it does not), I haven't seen a really good
> description of why this kind of trick *can't* work, and also why
> making it work is not a desirable extension to the language.
>
> your solution (a) does not `merely copy the syntax of parens' (which
> is the main point)

Then I'd say your point is pointless. There are lots of things you can't
do if you constrain yourself not to write any code. So?

> and (b) does not, in fact, implement the full
> semantics of the paren reader anyway, because it cannot cope with
> consing dot (this is acceptable in many cases, of course).

Not to mention easy to fix should that be deemed a desirable feature.

E.

Erik Naggum

unread,
Sep 3, 2002, 6:31:33 PM9/3/02
to
* Erann Gat

| (IMO dotted lists are a Bad Idea anyway.)

How do you propose to represent a list ending in a non-nil atom?

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

Erann Gat

unread,
Sep 3, 2002, 7:27:47 PM9/3/02
to
In article <32400810...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:

> * Erann Gat
> | (IMO dotted lists are a Bad Idea anyway.)
>
> How do you propose to represent a list ending in a non-nil atom?

When I say "dotted lists are a bad idea" I'm referring to the data
structure, not the syntax. In other words, I think using a linked list
ending in a non-nil atom is (almost always) bad software engineering
practice. If you have a legitimate reason to use a linked list ending in
a non-nil atom, and you need a surface-syntax representation of that data
structure, then using dot notation is fine. But, rather like "eval", I
believe that more often than not the existence of a dot indicates a
problem in design.

E.

Erik Naggum

unread,
Sep 3, 2002, 8:13:41 PM9/3/02
to
* Erann Gat

| When I say "dotted lists are a bad idea" I'm referring to the data structure,
| not the syntax. [...] But, rather like "eval", I believe that more often than

| not the existence of a dot indicates a problem in design.

Splendid! So association lists are misdesigned, too.

Christopher Browne

unread,
Sep 3, 2002, 8:15:19 PM9/3/02
to

.. But if a list representation scheme can't cope with this, _it's
broken_.

It may be poor style, but if someone breaks the reader so badly that
it can't read this, I'm not going to point at the poor style as being
at fault. It's the idiot that broke the reader that is at fault.
--
(concatenate 'string "cbbrowne" "@ntlug.org")
http://cbbrowne.com/info/emacs.html
Rules of the Evil Overlord #41. "Once my power is secure, I will
destroy all those pesky time-travel devices."
<http://www.eviloverlord.com/>

Paul F. Dietz

unread,
Sep 3, 2002, 8:15:22 PM9/3/02
to
Erann Gat wrote:

> When I say "dotted lists are a bad idea" I'm referring to the data
> structure, not the syntax. In other words, I think using a linked list
> ending in a non-nil atom is (almost always) bad software engineering
> practice.

How nice. If one's proposed solution doesn't work, one just says the
part of the standard on which it screws up is a bad idea.

Remind me not to buy a compiler from you.

Paul

Erann Gat

unread,
Sep 3, 2002, 10:42:17 PM9/3/02
to
In article <32400872...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:

> * Erann Gat
> | When I say "dotted lists are a bad idea" I'm referring to the data
structure,
> | not the syntax. [...] But, rather like "eval", I believe that more
often than
> | not the existence of a dot indicates a problem in design.
>
> Splendid! So association lists are misdesigned, too.

We've had this discussion before if you recall
(http://groups.google.com/groups?selm=gat-2002021351120001%40eglaptop.jpl.nasa.gov).
I see no point in repeating it.

E.

Erann Gat

unread,
Sep 3, 2002, 10:48:22 PM9/3/02
to
In article <3D7552A3...@dls.net>, "Paul F. Dietz" <di...@dls.net> wrote:

> Erann Gat wrote:
>
> > When I say "dotted lists are a bad idea" I'm referring to the data
> > structure, not the syntax. In other words, I think using a linked list
> > ending in a non-nil atom is (almost always) bad software engineering
> > practice.
>
> How nice. If one's proposed solution doesn't work, one just says the
> part of the standard on which it screws up is a bad idea.

In what sense does my solution not work? Do you really need me to spell
out for you exactly how to write a read-delimited-list function so that it
handles dot notation?

E.

Tim Bradshaw

unread,
Sep 4, 2002, 5:09:29 AM9/4/02
to
* Erann Gat wrote:

> Then I'd say your point is pointless. There are lots of things you
> can't do if you constrain yourself not to write any code. So?

You may not have noticed (which is curious, since you have posted an
article in the thread concerned) but there has been an extensive
`discussion' in the last few days where someone is claiming that it is
possible to do exactly what I am demonstrating is not possible.

It is easy to produce an argument from authority that what he wants to
do can not be done - essentially `the standard says explicitly that
this will not work'. What I have tried (and obviously failed, in some
cases at least) to do instead is to provide a *rationale* for why it
is not done - namely that the language does not provide enough
information to enable a predefined delimiter reader to know what its
matching delimiter should be.

I find these kinds of explanations to be useful. They certainly help
me to understand why things are the way they are, and what would be
required to change things to make them be different, and I hope they
help others to understand things better too: sometimes just saying
`it's this way because it is' is not as useful an answer as `it's this
way because if it was some other way lots of other things would need
to be different too'.

But maybe I'm wrong, perhaps we should all be more authoritarian and
simply quote the law (standard) at people blindly. I don't know.

> Not to mention easy to fix should that be deemed a desirable feature.

I actually found a general READ-DELIMITED-FORM quite hard to write. I
think it is an omission from the standard language, as although it is
clearly user-implementable I suspect that few people (and I don't
think I am in that few) could get it completely correct. I'd rather
have vendors implement it for me, possibly as an extension to
READ-DELIMITED-LIST (and if any of them have done so, please say!).

--tim

Ingvar Mattsson

unread,
Sep 4, 2002, 8:14:54 AM9/4/02
to
g...@jpl.nasa.gov (Erann Gat) writes:

Now, given your piece of code...

(set-syntax-from-char #\< #\()
(set-syntax-from-char #\> #\))

It only works for the characters explicitly handled in your code and
will not work for other delimiters.

It will (however) work in most cases I (myself) would use it, so it's
"good enough" (just like my memoising lambda, it only handles a *very*
plain argument list (no keywords, no &rest, no nothing, basically).

//Ingvar
--
"I'm in 386 enchanted mode."

Erann Gat

unread,
Sep 4, 2002, 1:40:48 PM9/4/02
to
In article <ey3heh6...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

> * Erann Gat wrote:
>
> > Then I'd say your point is pointless. There are lots of things you
> > can't do if you constrain yourself not to write any code. So?
>
> You may not have noticed (which is curious, since you have posted an
> article in the thread concerned) but there has been an extensive
> `discussion' in the last few days where someone is claiming that it is
> possible to do exactly what I am demonstrating is not possible.

Yes, I noticed that discussion :-) but my interpretation of the claim
being made was different. Ilias did not say that it *did* work (in fact
he explicitly said that it didn't) but that it *ought* to work according
to his (incorrect) interpretation of the standard.

> It is easy to produce an argument from authority that what he wants to
> do can not be done - essentially `the standard says explicitly that
> this will not work'. What I have tried (and obviously failed, in some
> cases at least) to do instead is to provide a *rationale* for why it
> is not done - namely that the language does not provide enough
> information to enable a predefined delimiter reader to know what its
> matching delimiter should be.

Then I don't buy your rationale, since it's clearly trivial to provide the
missing information.

> I find these kinds of explanations to be useful. They certainly help
> me to understand why things are the way they are, and what would be
> required to change things to make them be different, and I hope they
> help others to understand things better too: sometimes just saying
> `it's this way because it is' is not as useful an answer as `it's this
> way because if it was some other way lots of other things would need
> to be different too'.

But it's not true that "lots of other things would need to be different."
A few little things would have to be different, and in fact the standard
allows users to easily change things so that they *are* different in
exactly the way they need to be to make things work more intuitively.

> > Not to mention easy to fix should that be deemed a desirable feature.
>
> I actually found a general READ-DELIMITED-FORM quite hard to write. I
> think it is an omission from the standard language, as although it is
> clearly user-implementable I suspect that few people (and I don't
> think I am in that few) could get it completely correct. I'd rather
> have vendors implement it for me, possibly as an extension to
> READ-DELIMITED-LIST (and if any of them have done so, please say!).

You could always just snarf an implementation from one of the many
open-source Lisps that are out there. Every one of them has a function
embedded in the reader that does the right thing.

A quick-and-dirty approach is to install a temporary readtable in which
the dot is not a macro character, read the list, and then post-process it
if it contains the symbol |.|.

E.

Pierpaolo BERNARDI

unread,
Sep 4, 2002, 5:59:19 PM9/4/02
to

"Erann Gat" <g...@jpl.nasa.gov> ha scritto nel messaggio news:gat-040902...@k-137-79-50-101.jpl.nasa.gov...

Ugh. Way too much dirty. Almost perlish.

What about '(a \. b) ?

P.

ilias

unread,
Sep 4, 2002, 7:16:38 PM9/4/02
to

\ = dirt, too.

not pure syntax. ugly.

i'm just reading but not analyzing today.

but i've the feeling that you all hunting ghosts (speak: try to solve a
problem which does not exist).

i'll look at this tomorrow.

time to sleep.

Tim Bradshaw

unread,
Sep 5, 2002, 4:18:40 AM9/5/02
to
* Erann Gat wrote:

> Then I don't buy your rationale, since it's clearly trivial to
> provide the missing information.

Well, sorry you don't. I guess I'll just have to hope other people
appreciate it.


> You could always just snarf an implementation from one of the many
> open-source Lisps that are out there. Every one of them has a function
> embedded in the reader that does the right thing.

It would be nice if they could be just plugged in, but the guts of
READ aren't generally that simple in my experience.

> A quick-and-dirty approach is to install a temporary readtable in which
> the dot is not a macro character, read the list, and then post-process it
> if it contains the symbol |.|.

Doesn't work. (a) it's terribly expensive since you need to copy the
readtable on every call to RDF, (b) it just doesn't work because when
you call READ recursively you don't know what you are reading:

(read-delimited-form #\)
(make-string-input-stream "#@(this is some . thing))")

so it is never safe to do anything with what #\. does in the
readtable. Like I said, I think it's fairly fiddly to get right.

--tim

Erann Gat

unread,
Sep 5, 2002, 3:56:48 PM9/5/02
to
In article <ey33cso...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

> * Erann Gat wrote:
>
> > Then I don't buy your rationale, since it's clearly trivial to
> > provide the missing information.
>
> Well, sorry you don't. I guess I'll just have to hope other people
> appreciate it.
>
>
> > You could always just snarf an implementation from one of the many
> > open-source Lisps that are out there. Every one of them has a function
> > embedded in the reader that does the right thing.
>
> It would be nice if they could be just plugged in, but the guts of
> READ aren't generally that simple in my experience.
>
> > A quick-and-dirty approach is to install a temporary readtable in which
> > the dot is not a macro character, read the list, and then post-process it
> > if it contains the symbol |.|.
>
> Doesn't work. (a) it's terribly expensive since you need to copy the
> readtable on every call to RDF,

Not unless you're defining new read macros on the fly. If your read table
is static (which one would expect) you only need to copy it once.

> (b) it just doesn't work because when
> you call READ recursively you don't know what you are reading:

That is a valid point.

Just to short-circuit this whole discussion, here's delimited list reader
that handles dots properly. It took me about fifteen minutes to write.

(defun read-delim (stream char)
"Works like read-delimited list except that it handles dots properly
Also has a hook for defining closing braces so it can work with [] and {}"
(let ( (c (peek-char t stream)) )
(cond ( (eql c #\.)
(read-char)
(let ( (c1 (peek-char nil stream)) )
(if (or (whitespacep c1) (not (nth-value 1
(get-macro-character c1))))
(prog1
(read stream)
(if (eql (peek-char t stream) (matching-brace char))
(read-char)
(error "Syntax error")))
(progn
(unread-char c stream)
(cons (read stream t nil t) (read-delim stream char))))) )
( (eql c (matching-brace char))
(read-char stream)
nil )
(t (cons (read stream t nil t) (read-delim stream char))))))

(defun matching-brace (c)
(case c (#\( #\)) (#\[ #\]) (#\{ #\})))


BTW, it turns out that read-delimited-list in MCL already handles dots properly.

E.

Tim Bradshaw

unread,
Sep 6, 2002, 8:46:36 AM9/6/02
to
* Erann Gat wrote:

> Not unless you're defining new read macros on the fly. If your read table
> is static (which one would expect) you only need to copy it once.

Yes, I guess that depends on whether you are writing library code or
use-once code. For something like this I'd be more interested in
library code, rather than something which will blow up in my face if I
do something unexpected.

> (defun read-delim (stream char)
> "Works like read-delimited list except that it handles dots properly
> Also has a hook for defining closing braces so it can work with [] and {}"
> (let ( (c (peek-char t stream)) )
> (cond ( (eql c #\.)
> (read-char)
> (let ( (c1 (peek-char nil stream)) )
> (if (or (whitespacep c1) (not (nth-value 1
> (get-macro-character c1))))
> (prog1
> (read stream)
> (if (eql (peek-char t stream) (matching-brace char))
> (read-char)
> (error "Syntax error")))
> (progn
> (unread-char c stream)
> (cons (read stream t nil t) (read-delim stream char))))) )
> ( (eql c (matching-brace char))
> (read-char stream)
> nil )
> (t (cons (read stream t nil t) (read-delim stream char))))))

> (defun matching-brace (c)
> (case c (#\( #\)) (#\[ #\]) (#\{ #\})))


Oh dear. There are too many bugs in this for me, sorry. Even after
fixing the obvious typos: doesn't handle recursive reads right,
unreads a character after peeking beyond it.

I don't think I have anything more to say about this subject.

--tim

Erann Gat

unread,
Sep 6, 2002, 1:28:09 PM9/6/02
to
In article <ey3lm6f...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

> Oh dear. There are too many bugs in this for me, sorry.

On usenet you get what you pay for.

> doesn't handle recursive reads right,

I'm not sure what you mean by this. It seems to work fine on [1 . [2 . 3]]

> unreads a character after peeking beyond it.

Yeah, my bad. Easily fixed.

(defun read-delim (stream char)
"Works like read-delimited list except that it handles dots properly
Also has a hook for defining closing braces so it can work with [] and {}"
(let ( (c (peek-char t stream)) )
(cond ( (eql c #\.)

(read-char stream)


(let ( (c1 (peek-char nil stream)) )

(if (or (whitespacep c1) (terminating-macro-char-p c1))


(prog1
(read stream)
(if (eql (peek-char t stream) (matching-brace char))

(read-char stream)
(error "Syntax error")))
(cons (read (make-concatenated-stream
(make-string-input-stream ".") stream)
t nil t)
(read-delim stream char)))) )


( (eql c (matching-brace char))
(read-char stream)
nil )
(t (cons (read stream t nil t) (read-delim stream char))))))

(defun terminating-macro-char-p (c)
(multiple-value-bind (macro-p non-terminating-p) (get-macro-character c)
(and macro-p (not non-terminating-p))))

(defun matching-brace (c)
(case c (#\( #\)) (#\[ #\]) (#\{ #\})))

? (read-from-string "[1 .[2 . .2]]")
(1 2 . 0.2)


(I had to use read-from-string because in MCL the listener is not an
input-stream, so the make-concatenated-stream trick doesn't work. This
seems like a bug in MCL.)

E.

Tim Bradshaw

unread,
Sep 8, 2002, 2:46:06 PM9/8/02
to
* Erann Gat wrote:

> On usenet you get what you pay for.

Yes, sorry, I haven't been asking anyone to produce such a RDF thing.
In fact If I'm asking anyone to do anything I'm trying to suggest that
it might be a good thing for vendors to produce.

>> doesn't handle recursive reads right,

> I'm not sure what you mean by this. It seems to work fine on [1
> . [2 . 3]]

I think you need to have recursivep arguments to various functions. I
think that even if you do that you technically must lose because you
can't get (read-delimited-form #\) "foo #1=bar #1#)" to work right,
because there is no user access to the reader context used to do
this. (This latter never matters in practice, because the function is
(almost) always called from somewhere where the context is set up.)

>> unreads a character after peeking beyond it.

> Yeah, my bad. Easily fixed.

I think your fix is basically right, although I got all obsessive
about closing all the streams.

A much more significant problem, I think, is that you don't detect
*and call* readmacros. It took me some time (and some hints from
Erik) to realise that you need to do this. The kind of problem you
get if you don't is if you are looking at something like:

"#+(or) 3 . foo ..."

Then you *can't* call READ, because really you are looking at
". foo ..." - you have to spot that #\# is a readmacro, call it, and
find that it returns no values in this case.

All of this is a way of saying that in order to do this you have to
implement something that is the actual reader algorithm, I think.

--tim

Erann Gat

unread,
Sep 8, 2002, 3:58:06 PM9/8/02
to
In article <ey3elc4...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

> * Erann Gat wrote:
>
> > On usenet you get what you pay for.
>
> Yes, sorry, I haven't been asking anyone to produce such a RDF thing.

RDF?

I'm not writing this code because you asked me to, I'm writing this code
to support a position with respect to the topic of this thread.

> >> doesn't handle recursive reads right,
>
> > I'm not sure what you mean by this. It seems to work fine on [1
> > . [2 . 3]]
>
> I think you need to have recursivep arguments to various functions.

Yeah, could be. So? Do you want me to work out every last detail for you?

> A much more significant problem, I think, is that you don't detect
> *and call* readmacros. It took me some time (and some hints from
> Erik) to realise that you need to do this. The kind of problem you
> get if you don't is if you are looking at something like:
>
> "#+(or) 3 . foo ..."
>
> Then you *can't* call READ, because really you are looking at
> ". foo ..." - you have to spot that #\# is a readmacro, call it, and
> find that it returns no values in this case.

Right. But you've just described exactly how to fix this problem, which
should be just one additional clause in the top-level cond, maybe 3-4
lines of code, no? (Granted, the sharpsign-equals syntax does seem to be
problematic. I'm not sure what will happen there.)

> All of this is a way of saying that in order to do this you have to
> implement something that is the actual reader algorithm, I think.

No, I don't think that's true. Certainly you haven't convincingly shown
it to be true. Just because you haven't thought of a way to do it (that
is, write a completely correct read-delim without rewriting the whole
reader) doesn't prove it's impossible. (Take another look at the topic of
this thread to see why this is relevant.)

In any case, the fact that read-delimited-list doesn't isn't required to
handle dots does seem to me like a bug in the spec.

BTW, I will reiterate my earlier position: if you really find yourself
writing code where you require completely "correct" behavior of this sort
IMO you're almost certainly doing something wrong.

E.

ilias

unread,
Sep 8, 2002, 5:23:46 PM9/8/02
to
Erann Gat wrote:
> In article <ey3elc4...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:
>
>>* Erann Gat wrote:
>>
>>>On usenet you get what you pay for.
>>
>>Yes, sorry, I haven't been asking anyone to produce such a RDF thing.
>
> RDF?

read-delimited-form


Tim Bradshaw

unread,
Sep 8, 2002, 5:18:29 PM9/8/02
to
* Erann Gat wrote:

> RDF?

READ-DELIMITED-FORM


> Yeah, could be. So? Do you want me to work out every last detail
> for you?

No: I didn't ask you to write the thing. However, to counter the
claim that it's hard to do correctly you do need to get all the
details right. It's the details that make it hard: anyone (well, any
slightly competent Lisp programmer) can do a job that works in the
easy cases.

> No, I don't think that's true. Certainly you haven't convincingly
> shown it to be true. Just because you haven't thought of a way to
> do it (that is, write a completely correct read-delim without
> rewriting the whole reader) doesn't prove it's impossible. (Take
> another look at the topic of this thread to see why this is
> relevant.)

I have what I think is a correct RDF already[1], and it works by
implementing pretty closely the reader's algorithm for finding the
start of a token and deciding what to do with it (which is what I
meant by `implement the reader's algorithm': sorry to be confusing).

> BTW, I will reiterate my earlier position: if you really find
> yourself writing code where you require completely "correct"
> behavior of this sort IMO you're almost certainly doing something
> wrong.

I always require my library functions to behave correctly. About the
last thing I want is for some unforseen problem to come up on the
millionth form the thing has to read, especially as such a problem
might mean it reads *something other than what I meant* and doesn't
tell me it did that. This tends to mean that things take me a while
to write, but they do tend to work right once they've been done.
This, alas, isn't a recipe for getting rich in today's world, but I
get a certain satisfaction from a job well done nonetheless.

--tim

Footnotes:
[1] I'm not publishing it because it's kind of clunky and
embarrassing.

ilias

unread,
Sep 8, 2002, 10:54:03 PM9/8/02
to
Erann Gat wrote:

> In any case, the fact that read-delimited-list doesn't isn't required to
> handle dots does seem to me like a bug in the spec.

The dot itself is the 'bug'.

But proofable is only 'violation of spirit'...

...maybe of the specs.

Erann Gat

unread,
Sep 9, 2002, 3:13:53 AM9/9/02
to
In article <ey3y9ac...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

> However, to counter the
> claim that it's hard to do correctly you do need to get all the
> details right.

Fair enough:

(defun read-delim (stream char)
"Works like read-delimited list except that it handles dots properly
Also has a hook for defining closing braces so it can work with [] and {}"
(let ( (c (peek-char t stream)) )
(cond ( (eql c #\.)

; Check if this is an isolated dot


(read-char stream)
(let ( (c1 (peek-char nil stream)) )
(if (or (whitespacep c1) (terminating-macro-char-p c1))

; Yes it is, read the last cdr cell
(prog1
(read stream)
; Anything other than a closing brace now is a syntax error


(if (eql (peek-char t stream) (matching-brace char))
(read-char stream)
(error "Syntax error")))

; No, need a clever trick to push the dot back onto the
; the stream so the reader can read the token it's a part of


(cons (read (make-concatenated-stream
(make-string-input-stream ".") stream)
t nil t)
(read-delim stream char)))) )

; Handle macro characters -- note this subsumes the case of the
; closing brace so a separate test for that is no longer necessary
; assuming we do a (set-syntax-from-char close-brace #\))
; (At least, this works in MCL, I didn't actually check the standard
; to see if it will work universally)
( (get-macro-character c)
(read-char stream)
(let ( (l (multiple-value-list
(funcall (get-macro-character c) stream c))) )
(if l
(cons (car l) (read-delim stream char))
(read-delim stream char))) )
; Not a dot, not a read macro, just do the obvious recursive thing


(t (cons (read stream t nil t) (read-delim stream char))))))

(defun terminating-macro-char-p (c)
(multiple-value-bind (macro-p non-terminating-p) (get-macro-character c)
(and macro-p (not non-terminating-p))))

(defun matching-brace (c)
(case c (#\( #\)) (#\[ #\]) (#\{ #\})))

; A quick test case
? '[1 #1=2 #1# #+(or) 3 . 4]
(1 2 2 . 4)

E.

0 new messages