READ-DELIMITED-FORM

Tim Bradshaw

unread,

Sep 4, 2002, 10:10:44 AM9/4/02

to

I've been thinking about how to implement this function, and I've
convinced myself that it's very hard indeed. But perhaps I'm wrong, so
I'll ask in case anyone else has better ideas than me.

Here's what it should do:

read-delimited-form char &optional input-stream recursive-p

=> form

Behaves exactly as read-delimited-list except it will deal with a
`consing dot', and can thus yield dotted lists.

So, here's what I don't know how to do.

The basic loop (much simplified) is something like:

look for the closing delimiter, done if so;

look for a consing dot, if found do the dotted-list bit;

otherwise read a form and carry on.

`look for a consing dot' is the hard bit. It's trivially hard (I
think) because you need to unread more than one character - read the
dot, and then see what is beyond it, then be willing to unread
whatever you found, *and the dot*. This is to cope with things like
".x" in the stream.

Unreading multiple characters can be dealt with by the devious (I
think) trick of inventing a new stream which is a concatenated stream
of a string stream reading the chars to be unread and the original
stream. I was really pleased when I thought of this.

But it's actually much worse: how do you know, when looking beyond a
dot, whether it is consing or not? I thought: look for whitespace.
But no, this is wrong, because "(a .(foo))" should read as (a foo).
Bum!

I don't know how to get around this without either reimplementing most
of the reader, or doing something horrible like calling READ trapping
lots of errors and being willing to back out. This latter can almost
certainly never be correct because of reader side-effects (like
interning a symbol, or much worse).

So I think this is actually very hard. But I'd be delighted to be
proved wrong. Does anyone have any ideas?

One thing that I would *really like* in CL is a way of calling the
reader such that it returns some object together with information
about what it *would* do with that object - in particular if it would
return an INTEGER it should return instead #<opaque-object> and, say
INTEGER. Crucially the reader should not have actually done any
side-effects (other than moving the stream pointer) at this point.
There should be various queries you can perform on the token it
returns, such as finding what characters got eaten from the stream to
read it and perhaps others (say, find the package name of a
symbol-token, whether it has : or ::, and the symbol name). ANd
finally you should be able to say `go ahead and make the object for
this token'.

I obviously haven't thought this through very far, and the sketched
interface above is junk, I think, because it would probably be very
hard to implement for lots of readers (and also it's just junk
anyway), but what I really want to have is some way of getting at the
reader *before* it does things like intern symbols and so on. That
would be such a nice thing to have.

--tim

Tim Bradshaw

unread,

Sep 4, 2002, 1:46:53 PM9/4/02

to

* I wrote:
> But it's actually much worse: how do you know, when looking beyond a
> dot, whether it is consing or not? I thought: look for whitespace.
> But no, this is wrong, because "(a .(foo))" should read as (a foo).
> Bum!

> So I think this is actually very hard. But I'd be delighted to be

> proved wrong. Does anyone have any ideas?

And the answer is that I should learn to RTFM. GET-MACRO-CHARACTER
tells me what I need to know - I need to look for whitespace or
something which is a terminating macro character.

Thanks to Christian Ohler for pointing this out by mail.

--tim

Erik Naggum

unread,

Sep 4, 2002, 5:43:53 PM9/4/02

to

* Tim Bradshaw

| I've been thinking about how to implement this function, and I've convinced
| myself that it's very hard indeed.

You have asked for hooks into the reader previously, as well, and it is
something I have wanted for a long time, too. In particular, I would like
to stop the reader before it interns a symbol and instead use find-symbol on
the string to avoid creating a new symbol. I also think it would be nice to
make , a non-terminating macro character so you can read back integers like
1,073,741,824.

| So, here's what I don't know how to do.

You do this at too high a level. You must read a token and intervene before
it is interpreted as an integer, floating-point number, or symbol. You will
find a function that does this in all available Common Lisp implementations.

I would think that a portable implementation of the reader that is way more
programmable than the one we have today would be a worthwhile project. I am
certainly interested in spending time on it as I want it for my own needs.

| `look for a consing dot' is the hard bit.

Not at all, but it is hard to do it after the token has been interpreted and
the information upon which you have to make this decision has been destroyed.

| So I think this is actually very hard. But I'd be delighted to be
| proved wrong. Does anyone have any ideas?

I think the above should remove all the problems you have tried to solve.

| I obviously haven't thought this through very far, and the sketched interface
| above is junk, I think, because it would probably be very hard to implement
| for lots of readers (and also it's just junk anyway), but what I really want
| to have is some way of getting at the reader *before* it does things like
| intern symbols and so on. That would be such a nice thing to have.

Very much so.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

Erik Naggum

unread,

Sep 4, 2002, 5:47:10 PM9/4/02

to

* Tim Bradshaw

| And the answer is that I should learn to RTFM. GET-MACRO-CHARACTER tells me
| what I need to know - I need to look for whitespace or something which is a
| terminating macro character.

This is unfortunately completely misguided. Good thing you did not take
credit for it. ;)

ilias

unread,

Sep 4, 2002, 7:20:57 PM9/4/02

to

As my analytic is 'off' for today, can someone please confirm that the
stated problems are true?

I cannot believe that the problems stated are true/difficult or whatever.

please confirm, so i can look tomorrow at this.

zzzz!

Frank A. Adrian

unread,

Sep 4, 2002, 11:48:55 PM9/4/02

to

ilias wrote:

> I cannot believe that the problems stated are true/difficult or whatever.`

Why would you doubt it? Why would you characterize it automatically as
difficult, even if it is true? Why are you so doubtful and negative?

faa

Christopher Browne

unread,

Sep 5, 2002, 12:35:03 AM9/5/02

to

Quoth "Frank A. Adrian" <fad...@ancar.org>:

Well, despite being a newcomer, he's apparently wiser and 'more
friendly' than all the other people put together. When he burps out
statements, they have the kind of infallibility that Catholic Popes
merely _wish_ that they had.
--
(concatenate 'string "cbbrowne" "@cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
Rules of the Evil Overlord #85. "I will not use any plan in which the
final step is horribly complicated, e.g. "Align the 12 Stones of Power
on the sacred altar then activate the medallion at the moment of total
eclipse." Instead it will be more along the lines of "Push the
button." <http://www.eviloverlord.com/>

Tim Bradshaw

unread,

Sep 5, 2002, 4:44:24 AM9/5/02

to

* Erik Naggum wrote:
> | `look for a consing dot' is the hard bit.

> Not at all, but it is hard to do it after the token has been
> interpreted and the information upon which you have to make this
> decision has been destroyed.

Yes, This is clearly correct. I was trying to stick within the
standard language, and I think that there just aren't quite the
facilities you need to do this.

I think there would be two interesting things to do in terms of
KMP-style `substandards' here:

1. try for a standard (well, substandard) READ-DELIMITED-FORM as this
would be just a useful thing to have, and it should be easy for
vendors to provide.

2. Try and work out a standard (...) interface which would let you
intervene in the reader at the token->object stage.

--tim

Tim Bradshaw

unread,

Sep 5, 2002, 4:46:15 AM9/5/02

to

* Erik Naggum wrote:

> This is unfortunately completely misguided. Good thing you did
> not take credit for it. ;)

Can you explain why?

(I realise that it probably can't be completely correct, but it
certainly makes my version work in a lot more cases).

--tim

ilias

unread,

Sep 5, 2002, 5:16:23 AM9/5/02

to

Frank A. Adrian wrote:
> ilias wrote:
>
>
>>I cannot believe that the problems stated are true/difficult or whatever.`
>
>
> Why would you doubt it?

simply as i'm 'preprogrammed' that

> Why would you characterize it automatically as
> difficult, even if it is true?

The posters have done this at some points.
Additionally the no. of posts implies that to me.

> Why are you so doubtful and negative?

i'm sorry for that.

i try to avoid this in future.

ilias

unread,

Sep 5, 2002, 5:20:05 AM9/5/02

to

Frank A. Adrian wrote:
> ilias wrote:
>
>
>>I cannot believe that the problems stated are true/difficult or whatever.`
>
>
> Why would you doubt it?

simply as i'm 'preprogrammed' that LISP 'code & data are the same'

> Why would you characterize it automatically as
> difficult, even if it is true?

The poster(s) have done this at some points.

Additionally the no. of posts implies that to me.

>Why are you so doubtful and negative?

i'm sorry for that.

i try to avoid this in future.

>
> faa

Erik Naggum

unread,

Sep 5, 2002, 5:20:49 AM9/5/02

to

* Tim Bradshaw <t...@cley.com>
| Can you explain why?

You leave the whitespace to (peek-char t) and the first character you look
at will necessarily have to be a macro character or a constituent character.
The reader algorithm is clearly described in both the standard and CLtL2.
There is no need to reinvent any of this by circumvention.

Tim Bradshaw

unread,

Sep 5, 2002, 5:53:43 AM9/5/02

to

* Erik Naggum wrote:

> You leave the whitespace to (peek-char t) and the first character
> you look at will necessarily have to be a macro character or a
> constituent character. The reader algorithm is clearly described
> in both the standard and CLtL2. There is no need to reinvent any
> of this by circumvention.

But don't I need to know if there *was* any whitespace?

The cases I'm thinking of (assume #\a is a constituent and #\( a macro
character) are these:

" .a" -> token whose name begins ".a"

" . a" -> consing dot followed by token beginning "a" (and the
next thing had better be the closing delimiter)

" .(" -> consing dot and whatever #\( reads as.

I think that (peek-char t) fails to distinguish between the first and
second of these cases. But I am now quite confused about the whole
thing.

--tim

Erik Naggum

unread,

Sep 5, 2002, 6:32:01 AM9/5/02

to

* Tim Bradshaw

| But don't I need to know if there *was* any whitespace?

No. Why do you think you need it?

| I think that (peek-char t) fails to distinguish between the first and second
| of these cases.

We have the following situation. After a token has been read, you are
either looking at a terminating macro character or a non-constituent
character such as whitespace. This is an invariant. Before you read a
token, you skip any whitespace. This is an invariant. So you read the
token. If that token is the consing dot, you read the next token and should
now look at the closing paren. You interpret and add the last read token to
your list in the appropriate manner and continue or return as appropriate.

| But I am now quite confused about the whole thing.

I completely fail to understand what can be confusing here. The reader
algorithm is described in detail in the standard and in CLtL2. I think you
may have confused yourself by trying to see the consing dot after you have
interpreted the tokens.

ilias

unread,

Sep 5, 2002, 6:50:53 AM9/5/02

to

Tim Bradshaw wrote:
> I've been thinking about how to implement this function, and I've
> convinced myself that it's very hard indeed. But perhaps I'm wrong, so
> I'll ask in case anyone else has better ideas than me.

i'll remember you a conversation we've had a few days ago:

Tim Bradshaw wrote:
>>>>>>Take a look at the function READ-DELIMITED-LIST for an example
>>>>>>of how to do it.
>>>>>i think this is not the right way.
>>>>But you'd be wrong, because it is.
>>>i'm not wrong.
>>>because it is *not* the only way.
>>>if it *is* a way.
>>>as i'm not sure if READ-DELIMITED-LIST works correct in the given
>>>context.
>>>but *why* should i try.
>>>i *feel* its the 'wrong' way.
>> Gosh, yes, I bet you do. With a mind like yours it must be such a
>> waste of time to have to deal with all these people who merely work
>> from hundreds of years of collective experience, and/or having
>> designed the language, mustn't it?
>
> you interprete to much into my words.
>
> i'm a LISP novice. i cannot deal with to much complexity.
>
> Solution with READ-DELIMITED-LIST will run me possibly in an egoistic-coding-trap.
>
> And tomorrow i have to continue on my C++ project.
>
> So, you help me out of that disaster and provide me the solution?
>
> As an experienced LISP-coder you should write it in about 5".

it seems that you're in a 'coding-trap'.

i'll help you out.

i'm a LISP-novice. But i don't need to know LISP to help you out.

my under-education is my strength.

Tim Bradshaw

unread,

Sep 5, 2002, 6:44:06 AM9/5/02

to

* Erik Naggum wrote:

> We have the following situation. After a token has been read, you
> are either looking at a terminating macro character or a
> non-constituent character such as whitespace. This is an
> invariant. Before you read a token, you skip any whitespace.
> This is an invariant. So you read the token. If that token is
> the consing dot, you read the next token and should now look at
> the closing paren. You interpret and add the last read token to
> your list in the appropriate manner and continue or return as
> appropriate.

Ah, I think I see where we are talking at cross purposes. I think
that you are assuming that I'm doing this the proper way - namely by
reading tokens and looking at what they are. But I'm not, I'm trying
to glue something together out of READ and bits of string. In
particular I don't have a token reader, I just have READ. So I'm
improvising a token reader which will essentially *only* spot the
consing dot token, and if it does not spot that it will leave things
such that I can then just call READ to get whatever is actually there.
And it's in the implementation of this that I need to look for
whatever follows the possibly-consing dot and worry about whitespace.

I realise that this is not the right way to do what I'm trying to do,
but I wanted to see if I could do it without either implementing a
token reader from the spec, or finding the system's one.

Sorry for being confusing.

--tim

Joe Marshall

unread,

Sep 5, 2002, 6:36:39 AM9/5/02

to

"Tim Bradshaw" <t...@cley.com> wrote in message news:ey3heh5...@cley.com...

>
> But it's actually much worse: how do you know, when looking beyond a
> dot, whether it is consing or not? I thought: look for whitespace.
> But no, this is wrong, because "(a .(foo))" should read as (a foo).
> Bum!

But consing an element onto a list yields a longer list.
(a . (foo)) will read as (cons a (list foo)) => (a foo)

Erik Naggum

unread,

Sep 5, 2002, 6:56:11 AM9/5/02

to

* Tim Bradshaw

| I think that you are assuming that I'm doing this the proper way - namely by
| reading tokens and looking at what they are. But I'm not, I'm trying to
| glue something together out of READ and bits of string.

But this must necessarily fail. You cannot possibly make this work.

| I realise that this is not the right way to do what I'm trying to do, but I
| wanted to see if I could do it without either implementing a token reader
| from the spec, or finding the system's one.

I really thought this was obvious from the outset: It cannot be done.

Tim Bradshaw

unread,

Sep 5, 2002, 7:08:13 AM9/5/02

to

* Erik Naggum wrote:

> I really thought this was obvious from the outset: It cannot be done.

Can you explain why?

(I hope I don't have to say this because you probably know me well
enough but: this is not some kind of hidden attack disguised as a
question, I am fairly sure you are correct, and I really do want to
know, and I'm sure you know more about the reader than I do (and are
better at spotting bugs too).)

--tim

ilias

unread,

Sep 5, 2002, 8:07:20 AM9/5/02

to

Erik Naggum wrote:
> * Tim Bradshaw
> | I've been thinking about how to implement this function, and I've convinced
> | myself that it's very hard indeed.
>
> You have asked for hooks into the reader previously, as well, and it is
> something I have wanted for a long time, too. In particular, I would like
> to stop the reader before it interns a symbol and instead use find-symbol on
> the string to avoid creating a new symbol. I also think it would be nice to
> make , a non-terminating macro character so you can read back integers like
> 1,073,741,824.

be aware, that this would violate the standard syntax:
http://www.lispworks.com/reference/HyperSpec/Body/02_dg.htm

> | So, here's what I don't know how to do.
>
> You do this at too high a level. You must read a token and intervene before
> it is interpreted as an integer, floating-point number, or symbol. You will
> find a function that does this in all available Common Lisp implementations.

which is this function?

> I would think that a portable implementation of the reader that is way more
> programmable than the one we have today would be a worthwhile project. I am
> certainly interested in spending time on it as I want it for my own needs.

can be done with a few lines of CL conforming code.

i'm not sure, if the implementation must be 'conforming' to the spirit
of LISP, too. I'm even not sure what i meant by that.

Erik Naggum

unread,

Sep 5, 2002, 8:43:22 AM9/5/02

to

* Tim Bradshaw
| Can you explain why?

Because the reader algorithm is defined in terms of tokens that are examined
before they are turned into integers, floating-point numbers, or symbols.
The tokens ., .., and ... must all be interpreted (or cause errors) prior to
being turned into symbols, and if you expect to be able to look at them
after `read´ has already returned, the original information is lost and you
will have insurmountable problems reconstructing the original characters
that made up the token, just like you cannot recover the case information
from a token that turned into an integer or symbol. The hard-wired nature
of ) likewise has to be determined prior to processing it as a terminating
macro characters.

The usual way to implement the tokenization phase of the reader is to work
with a special buffer-related substring or mirrored buffer that characters
are copied into and then to use special knowledge of this buffer in the token
interpretation phase. The way I implement tokenizers and scanners is with
an offset from the current stream head to peek multiple characters into the
stream. When the terminating condition has been found, I know how many
characters to copy, if needed, and I am relatively well-informed of what I
have just scanned. When the token has been completed, I let the stream head
jump forward to the point where I want the next call to start. This may be
several characters shorter than I scanned ahead, naturally. I invented this
technique to parse SGML, which would otherwise have required multiple-
character read-ahead or some buffer on the side and much overhead.

Tim Bradshaw

unread,

Sep 5, 2002, 8:46:34 AM9/5/02

to

* I wrote:

> Can you explain why?

Here's one thing that is very hard to do. Consider the case where you
are using string-and-glue R-D-F to read conventional (...) syntax.
Consider this:

(x #+(or) dont:read)

Immediately after reading x, you check for a closing delimiter. There
isn't one, so call READ again. Oops. So, to do it right you need to
know what is coming next in much more detail. There are probably lots
of other cases.

(I confess that I found this by just making #\( call my R-D-F function
in the default readtable and trying to compile a fairly large
program...)

--tim

Erik Naggum

unread,

Sep 5, 2002, 9:17:33 AM9/5/02

to

* Tim Bradshaw

| Here's one thing that is very hard to do.

Tim, this is a really good time for you go read the standard on the reader
algorithm. I cannot fathom why you want to solve this any other way.

| Consider the case where you are using string-and-glue R-D-F to read
| conventional (...) syntax. Consider this:
|
| (x #+(or) dont:read)
|
| Immediately after reading x, you check for a closing delimiter. There
| isn't one, so call READ again. Oops.

What is the "oops" here? `read´ returns zero values in this case, and this
is really standard behavior. The proposed #; reader maco would do precisely
this, and end with `(values)´, and the code I posted here previously did.
In fact, the standard ; reader macro scans until the end of the line and
returns zero values.

| So, to do it right you need to know what is coming next in much more detail.

Sorry, this is still all wrong.

Tim Bradshaw

unread,

Sep 5, 2002, 10:53:22 AM9/5/02

to

* Erik Naggum wrote:

> What is the "oops" here? `read´ returns zero values in this case,
> and this is really standard behavior. The proposed #; reader maco
> would do precisely this, and end with `(values)´, and the code I
> posted here previously did. In fact, the standard ; reader macro
> scans until the end of the line and returns zero values.

It does? I can find no mention of a case where READ returns zero values
in the entry on it in the spec. Do you mean that the reader macro
function should return zero values? I know that, but I'm not using
that, I'm calling READ itself.

In particular, I think that

(handler-case
(with-open-stream (in (make-string-input-stream
"#+(or) foo)"))
(read in))
(error (e) (values nil e)))

should return NIL and a condition object.

--tim

Tim Bradshaw

unread,

Sep 5, 2002, 11:04:54 AM9/5/02

to

* Erik Naggum wrote:

> Tim, this is a really good time for you go read the standard on
> the reader algorithm. I cannot fathom why you want to solve this
> any other way.

Incidentally: I *don't* want to solve it any other way. What I was
trying to show was that the string and glue trick of
looking-for-a-consing-dot-or-a-delimiter and if not found just calling
READ *won't work* and pretty much *can't work* unless you start
teaching it about (at the very least) #+ and #-, or actually
essentially reimplementing the whole reader. So what one has to do
instead is bite the bullet and do the whole algorithm, not try and
make a string and glue solution.

--tim

Erik Naggum

unread,

Sep 5, 2002, 11:06:51 AM9/5/02

to

* Tim Bradshaw

| It does? I can find no mention of a case where READ returns zero values in
| the entry on it in the spec. Do you mean that the reader macro function
| should return zero values?

Yes, that was what I meant. My bad. But sadly, this goes to show my main
point, that the support for using `read´ in its own (re)implementation is
insufficient. You need to get below the values returned by read to be able
to capture the return value of reader macros.

Duane Rettig

unread,

Sep 5, 2002, 12:00:01 PM9/5/02

to

Tim Bradshaw <t...@cley.com> writes:

> * Erik Naggum wrote:
>
> > We have the following situation. After a token has been read, you
> > are either looking at a terminating macro character or a
> > non-constituent character such as whitespace. This is an
> > invariant. Before you read a token, you skip any whitespace.
> > This is an invariant. So you read the token. If that token is
> > the consing dot, you read the next token and should now look at
> > the closing paren. You interpret and add the last read token to
> > your list in the appropriate manner and continue or return as
> > appropriate.
>
> Ah, I think I see where we are talking at cross purposes. I think
> that you are assuming that I'm doing this the proper way - namely by
> reading tokens and looking at what they are. But I'm not, I'm trying
> to glue something together out of READ and bits of string. In
> particular I don't have a token reader, I just have READ. So I'm
> improvising a token reader which will essentially *only* spot the
> consing dot token, and if it does not spot that it will leave things
> such that I can then just call READ to get whatever is actually there.
> And it's in the implementation of this that I need to look for
> whatever follows the possibly-consing dot and worry about whitespace.

Instead of digging into the details, try looking at what you are in
fact trying to accomplish, which is to implement READ using READ.
Now think about recursive algorithms and termination rules...

> I realise that this is not the right way to do what I'm trying to do,
> but I wanted to see if I could do it without either implementing a
> token reader from the spec, or finding the system's one.

But you _are_ using the system's one, since you are using READ.

> Sorry for being confusing.

Not confusing, just confused :-)

--
Duane Rettig du...@franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182

ilias

unread,

Sep 5, 2002, 2:37:58 PM9/5/02

to

Duane Rettig wrote:
>>whatever follows the possibly-consing dot and worry about whitespace.
>
> Instead of digging into the details, try looking at what you are in
> fact trying to accomplish, which is to implement READ using READ.
> Now think about recursive algorithms and termination rules...

hint: the result in less than 10 lines of conforming code

>>I realise that this is not the right way to do what I'm trying to do,
>>but I wanted to see if I could do it without either implementing a
>>token reader from the spec, or finding the system's one.
>
> But you _are_ using the system's one, since you are using READ.

i think he meant finding the systems *token* reader.

Duane Rettig

unread,

Sep 5, 2002, 4:00:01 PM9/5/02

to

ilias <at_...@pontos.net> writes:

> Duane Rettig wrote:
> >>whatever follows the possibly-consing dot and worry about whitespace.
> > Instead of digging into the details, try looking at what you are in
>
> > fact trying to accomplish, which is to implement READ using READ.
> > Now think about recursive algorithms and termination rules...
>
> hint: the result in less than 10 lines of conforming code

Show us that result and we'll be happy to critique it for you.

> >>I realise that this is not the right way to do what I'm trying to do,
> >>but I wanted to see if I could do it without either implementing a
> >>token reader from the spec, or finding the system's one.
> > But you _are_ using the system's one, since you are using READ.
>
>
> i think he meant finding the systems *token* reader.

Precisely what I said. READ _is_ the system's token reader.

ilias

unread,

Sep 5, 2002, 4:50:29 PM9/5/02

to

Duane Rettig wrote:
> ilias <at_...@pontos.net> writes:
>
>
>>Duane Rettig wrote:
>>
>>>>whatever follows the possibly-consing dot and worry about whitespace.
>>>
>>>Instead of digging into the details, try looking at what you are in
>>
>>>fact trying to accomplish, which is to implement READ using READ.
>>>Now think about recursive algorithms and termination rules...
>>
>>hint: the result in less than 10 lines of conforming code
>
>
> Show us that result and we'll be happy to critique it for you.

coming soon. i'm not ready to try. 10 lines: intuitive 'guess'.

>>>>I realise that this is not the right way to do what I'm trying to do,
>>>>but I wanted to see if I could do it without either implementing a
>>>>token reader from the spec, or finding the system's one.
>>>
>>>But you _are_ using the system's one, since you are using READ.
>>
>>
>>i think he meant finding the systems *token* reader.
>
>
> Precisely what I said. READ _is_ the system's token reader.

pseudocode:

function token-reader (argument: stream) returning token
function object-creator(argument: token) returning object
function read (argument: stream) returning object

read(stream)
string-token := token-reader(stream)
object := object-creator( token )

i think this is his imagination of the internal structure, which seems
to me to be logical.

(i'm just writing the reply to your Scary-Table post.)

Tim Bradshaw

unread,

Sep 6, 2002, 7:34:17 AM9/6/02

to

* Erik Naggum wrote:
> Yes, that was what I meant. My bad. But sadly, this goes to show my main
> point, that the support for using `read´ in its own (re)implementation is
> insufficient. You need to get below the values returned by read to be able
> to capture the return value of reader macros.

Yes. I've spend some more time thinking about this and I think it is
absolutely essential that you (not `you, Erik' but `you, someone who
wants to do this') implement the actual reader algorithm to do this:
there are no shortcuts. In particular you have to consider
readmacros, but there are other things too. Fortunately there is a
good description of the algorithm!

I have an implementation of RDF which I think almost works now, and it
essentially does that, with some cheats. But it's very hairy, it
doesn't work on at least one implementation (due I think to stream
handling bugs in that implementation although I'm not sure), and it
has at least one potential bug and one actual bug. The actual bug is
that it doesn't handle the #n# and #n= macros, because it doesn't know
how to set up the context for them - so it only works if it's called
within an outer READ (which is actually oK). Almost all the hair and
bugs are because you can't (in the standard language) get at the point
just before a token is made into an object and look at what is there.
(And no, I'm not going to post it, it's too embarrassing.)

So I'd really like it if implementations made READ-DELIMITED-LIST support
dotted forms, probably with an extra option to do so so they remain
compatible.

I'd also like the ability to intervene at the token->object stage, but
I don't have any idea what a (sub)standard way of doing that would
look like.

--tim

Tim Bradshaw

unread,

Sep 6, 2002, 7:39:37 AM9/6/02

to

* Duane Rettig wrote:
> Instead of digging into the details, try looking at what you are in
> fact trying to accomplish, which is to implement READ using READ.
> Now think about recursive algorithms and termination rules...

yes, I have termination. I don't (even in yesterday's version I
didn't) *just* use READ, I do other things and then, perhaps, call
READ.

> But you _are_ using the system's one, since you are using READ.

yes, but not directly - by the time READ has done its thing it's way
too late. That's what the `other things' above do.

> Not confusing, just confused :-)

No, I don't think so. I *was* confused, but I realised before
yesterday's article that you couldn't just do what I was originally
trying to do and I was trying (in a confusing way, I agree) to explain
one of the reasons why the naive technique can not work.

--tim