I've been thinking about how to implement this function, and I've convinced myself that it's very hard indeed. But perhaps I'm wrong, so I'll ask in case anyone else has better ideas than me.
Behaves exactly as read-delimited-list except it will deal with a `consing dot', and can thus yield dotted lists.
So, here's what I don't know how to do.
The basic loop (much simplified) is something like:
look for the closing delimiter, done if so;
look for a consing dot, if found do the dotted-list bit;
otherwise read a form and carry on.
`look for a consing dot' is the hard bit. It's trivially hard (I think) because you need to unread more than one character - read the dot, and then see what is beyond it, then be willing to unread whatever you found, *and the dot*. This is to cope with things like ".x" in the stream.
Unreading multiple characters can be dealt with by the devious (I think) trick of inventing a new stream which is a concatenated stream of a string stream reading the chars to be unread and the original stream. I was really pleased when I thought of this.
But it's actually much worse: how do you know, when looking beyond a dot, whether it is consing or not? I thought: look for whitespace. But no, this is wrong, because "(a .(foo))" should read as (a foo). Bum!
I don't know how to get around this without either reimplementing most of the reader, or doing something horrible like calling READ trapping lots of errors and being willing to back out. This latter can almost certainly never be correct because of reader side-effects (like interning a symbol, or much worse).
So I think this is actually very hard. But I'd be delighted to be proved wrong. Does anyone have any ideas?
One thing that I would *really like* in CL is a way of calling the reader such that it returns some object together with information about what it *would* do with that object - in particular if it would return an INTEGER it should return instead #<opaque-object> and, say INTEGER. Crucially the reader should not have actually done any side-effects (other than moving the stream pointer) at this point. There should be various queries you can perform on the token it returns, such as finding what characters got eaten from the stream to read it and perhaps others (say, find the package name of a symbol-token, whether it has : or ::, and the symbol name). ANd finally you should be able to say `go ahead and make the object for this token'.
I obviously haven't thought this through very far, and the sketched interface above is junk, I think, because it would probably be very hard to implement for lots of readers (and also it's just junk anyway), but what I really want to have is some way of getting at the reader *before* it does things like intern symbols and so on. That would be such a nice thing to have.
* I wrote: > But it's actually much worse: how do you know, when looking beyond a > dot, whether it is consing or not? I thought: look for whitespace. > But no, this is wrong, because "(a .(foo))" should read as (a foo). > Bum! > So I think this is actually very hard. But I'd be delighted to be > proved wrong. Does anyone have any ideas?
And the answer is that I should learn to RTFM. GET-MACRO-CHARACTER tells me what I need to know - I need to look for whitespace or something which is a terminating macro character.
Thanks to Christian Ohler for pointing this out by mail.
* Tim Bradshaw | I've been thinking about how to implement this function, and I've convinced | myself that it's very hard indeed.
You have asked for hooks into the reader previously, as well, and it is something I have wanted for a long time, too. In particular, I would like to stop the reader before it interns a symbol and instead use find-symbol on the string to avoid creating a new symbol. I also think it would be nice to make , a non-terminating macro character so you can read back integers like 1,073,741,824.
| So, here's what I don't know how to do.
You do this at too high a level. You must read a token and intervene before it is interpreted as an integer, floating-point number, or symbol. You will find a function that does this in all available Common Lisp implementations.
I would think that a portable implementation of the reader that is way more programmable than the one we have today would be a worthwhile project. I am certainly interested in spending time on it as I want it for my own needs.
| `look for a consing dot' is the hard bit.
Not at all, but it is hard to do it after the token has been interpreted and the information upon which you have to make this decision has been destroyed.
| So I think this is actually very hard. But I'd be delighted to be | proved wrong. Does anyone have any ideas?
I think the above should remove all the problems you have tried to solve.
| I obviously haven't thought this through very far, and the sketched interface | above is junk, I think, because it would probably be very hard to implement | for lots of readers (and also it's just junk anyway), but what I really want | to have is some way of getting at the reader *before* it does things like | intern symbols and so on. That would be such a nice thing to have.
Very much so.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
* Tim Bradshaw | And the answer is that I should learn to RTFM. GET-MACRO-CHARACTER tells me | what I need to know - I need to look for whitespace or something which is a | terminating macro character.
This is unfortunately completely misguided. Good thing you did not take credit for it. ;)
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
Erik Naggum wrote: > * Tim Bradshaw > | I've been thinking about how to implement this function, and I've convinced > | myself that it's very hard indeed.
> You have asked for hooks into the reader previously, as well, and it is > something I have wanted for a long time, too. In particular, I would like > to stop the reader before it interns a symbol and instead use find-symbol on > the string to avoid creating a new symbol. I also think it would be nice to > make , a non-terminating macro character so you can read back integers like > 1,073,741,824.
> | So, here's what I don't know how to do.
> You do this at too high a level. You must read a token and intervene before > it is interpreted as an integer, floating-point number, or symbol. You will > find a function that does this in all available Common Lisp implementations.
> I would think that a portable implementation of the reader that is way more > programmable than the one we have today would be a worthwhile project. I am > certainly interested in spending time on it as I want it for my own needs.
> | `look for a consing dot' is the hard bit.
> Not at all, but it is hard to do it after the token has been interpreted and > the information upon which you have to make this decision has been destroyed.
> | So I think this is actually very hard. But I'd be delighted to be > | proved wrong. Does anyone have any ideas?
> I think the above should remove all the problems you have tried to solve.
> | I obviously haven't thought this through very far, and the sketched interface > | above is junk, I think, because it would probably be very hard to implement > | for lots of readers (and also it's just junk anyway), but what I really want > | to have is some way of getting at the reader *before* it does things like > | intern symbols and so on. That would be such a nice thing to have.
>> I cannot believe that the problems stated are true/difficult or whatever.`
> Why would you doubt it? Why would you characterize it automatically as > difficult, even if it is true? Why are you so doubtful and negative?
Well, despite being a newcomer, he's apparently wiser and 'more friendly' than all the other people put together. When he burps out statements, they have the kind of infallibility that Catholic Popes merely _wish_ that they had. -- (concatenate 'string "cbbrowne" "@cbbrowne.com") http://www3.sympatico.ca/cbbrowne/spreadsheets.html Rules of the Evil Overlord #85. "I will not use any plan in which the final step is horribly complicated, e.g. "Align the 12 Stones of Power on the sacred altar then activate the medallion at the moment of total eclipse." Instead it will be more along the lines of "Push the button." <http://www.eviloverlord.com/>
* Erik Naggum wrote: > | `look for a consing dot' is the hard bit. > Not at all, but it is hard to do it after the token has been > interpreted and the information upon which you have to make this > decision has been destroyed.
Yes, This is clearly correct. I was trying to stick within the standard language, and I think that there just aren't quite the facilities you need to do this.
I think there would be two interesting things to do in terms of KMP-style `substandards' here:
1. try for a standard (well, substandard) READ-DELIMITED-FORM as this would be just a useful thing to have, and it should be easy for vendors to provide.
2. Try and work out a standard (...) interface which would let you intervene in the reader at the token->object stage.
* Tim Bradshaw <t...@cley.com> | Can you explain why?
You leave the whitespace to (peek-char t) and the first character you look at will necessarily have to be a macro character or a constituent character. The reader algorithm is clearly described in both the standard and CLtL2. There is no need to reinvent any of this by circumvention.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
* Erik Naggum wrote: > You leave the whitespace to (peek-char t) and the first character > you look at will necessarily have to be a macro character or a > constituent character. The reader algorithm is clearly described > in both the standard and CLtL2. There is no need to reinvent any > of this by circumvention.
But don't I need to know if there *was* any whitespace?
The cases I'm thinking of (assume #\a is a constituent and #\( a macro character) are these:
" .a" -> token whose name begins ".a"
" . a" -> consing dot followed by token beginning "a" (and the next thing had better be the closing delimiter)
" .(" -> consing dot and whatever #\( reads as.
I think that (peek-char t) fails to distinguish between the first and second of these cases. But I am now quite confused about the whole thing.
* Tim Bradshaw | But don't I need to know if there *was* any whitespace?
No. Why do you think you need it?
| I think that (peek-char t) fails to distinguish between the first and second | of these cases.
We have the following situation. After a token has been read, you are either looking at a terminating macro character or a non-constituent character such as whitespace. This is an invariant. Before you read a token, you skip any whitespace. This is an invariant. So you read the token. If that token is the consing dot, you read the next token and should now look at the closing paren. You interpret and add the last read token to your list in the appropriate manner and continue or return as appropriate.
| But I am now quite confused about the whole thing.
I completely fail to understand what can be confusing here. The reader algorithm is described in detail in the standard and in CLtL2. I think you may have confused yourself by trying to see the consing dot after you have interpreted the tokens.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
Tim Bradshaw wrote: > I've been thinking about how to implement this function, and I've > convinced myself that it's very hard indeed. But perhaps I'm wrong, so > I'll ask in case anyone else has better ideas than me.
i'll remember you a conversation we've had a few days ago:
Tim Bradshaw wrote:
>>>>>>Take a look at the function READ-DELIMITED-LIST for an example >>>>>>of how to do it. >>>>>i think this is not the right way. >>>>But you'd be wrong, because it is. >>>i'm not wrong. >>>because it is *not* the only way. >>>if it *is* a way. >>>as i'm not sure if READ-DELIMITED-LIST works correct in the given >>>context. >>>but *why* should i try. >>>i *feel* its the 'wrong' way.
>> Gosh, yes, I bet you do. With a mind like yours it must be such a >> waste of time to have to deal with all these people who merely work >> from hundreds of years of collective experience, and/or having >> designed the language, mustn't it?
> you interprete to much into my words.
> i'm a LISP novice. i cannot deal with to much complexity.
> Solution with READ-DELIMITED-LIST will run me possibly in an egoistic-coding-trap.
> And tomorrow i have to continue on my C++ project.
> So, you help me out of that disaster and provide me the solution?
> As an experienced LISP-coder you should write it in about 5".
it seems that you're in a 'coding-trap'.
i'll help you out.
i'm a LISP-novice. But i don't need to know LISP to help you out.
* Erik Naggum wrote: > We have the following situation. After a token has been read, you > are either looking at a terminating macro character or a > non-constituent character such as whitespace. This is an > invariant. Before you read a token, you skip any whitespace. > This is an invariant. So you read the token. If that token is > the consing dot, you read the next token and should now look at > the closing paren. You interpret and add the last read token to > your list in the appropriate manner and continue or return as > appropriate.
Ah, I think I see where we are talking at cross purposes. I think that you are assuming that I'm doing this the proper way - namely by reading tokens and looking at what they are. But I'm not, I'm trying to glue something together out of READ and bits of string. In particular I don't have a token reader, I just have READ. So I'm improvising a token reader which will essentially *only* spot the consing dot token, and if it does not spot that it will leave things such that I can then just call READ to get whatever is actually there. And it's in the implementation of this that I need to look for whatever follows the possibly-consing dot and worry about whitespace.
I realise that this is not the right way to do what I'm trying to do, but I wanted to see if I could do it without either implementing a token reader from the spec, or finding the system's one.
> But it's actually much worse: how do you know, when looking beyond a > dot, whether it is consing or not? I thought: look for whitespace. > But no, this is wrong, because "(a .(foo))" should read as (a foo). > Bum!
But consing an element onto a list yields a longer list. (a . (foo)) will read as (cons a (list foo)) => (a foo)
* Tim Bradshaw | I think that you are assuming that I'm doing this the proper way - namely by | reading tokens and looking at what they are. But I'm not, I'm trying to | glue something together out of READ and bits of string.
But this must necessarily fail. You cannot possibly make this work.
| I realise that this is not the right way to do what I'm trying to do, but I | wanted to see if I could do it without either implementing a token reader | from the spec, or finding the system's one.
I really thought this was obvious from the outset: It cannot be done.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
* Erik Naggum wrote: > I really thought this was obvious from the outset: It cannot be done.
Can you explain why?
(I hope I don't have to say this because you probably know me well enough but: this is not some kind of hidden attack disguised as a question, I am fairly sure you are correct, and I really do want to know, and I'm sure you know more about the reader than I do (and are better at spotting bugs too).)
Erik Naggum wrote: > * Tim Bradshaw > | I've been thinking about how to implement this function, and I've convinced > | myself that it's very hard indeed.
> You have asked for hooks into the reader previously, as well, and it is > something I have wanted for a long time, too. In particular, I would like > to stop the reader before it interns a symbol and instead use find-symbol on > the string to avoid creating a new symbol. I also think it would be nice to > make , a non-terminating macro character so you can read back integers like > 1,073,741,824.
> You do this at too high a level. You must read a token and intervene before > it is interpreted as an integer, floating-point number, or symbol. You will > find a function that does this in all available Common Lisp implementations.
which is this function?
> I would think that a portable implementation of the reader that is way more > programmable than the one we have today would be a worthwhile project. I am > certainly interested in spending time on it as I want it for my own needs.
can be done with a few lines of CL conforming code.
i'm not sure, if the implementation must be 'conforming' to the spirit of LISP, too. I'm even not sure what i meant by that.
Because the reader algorithm is defined in terms of tokens that are examined before they are turned into integers, floating-point numbers, or symbols. The tokens ., .., and ... must all be interpreted (or cause errors) prior to being turned into symbols, and if you expect to be able to look at them after `read´ has already returned, the original information is lost and you will have insurmountable problems reconstructing the original characters that made up the token, just like you cannot recover the case information from a token that turned into an integer or symbol. The hard-wired nature of ) likewise has to be determined prior to processing it as a terminating macro characters.
The usual way to implement the tokenization phase of the reader is to work with a special buffer-related substring or mirrored buffer that characters are copied into and then to use special knowledge of this buffer in the token interpretation phase. The way I implement tokenizers and scanners is with an offset from the current stream head to peek multiple characters into the stream. When the terminating condition has been found, I know how many characters to copy, if needed, and I am relatively well-informed of what I have just scanned. When the token has been completed, I let the stream head jump forward to the point where I want the next call to start. This may be several characters shorter than I scanned ahead, naturally. I invented this technique to parse SGML, which would otherwise have required multiple- character read-ahead or some buffer on the side and much overhead.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
Here's one thing that is very hard to do. Consider the case where you are using string-and-glue R-D-F to read conventional (...) syntax. Consider this:
(x #+(or) dont:read)
Immediately after reading x, you check for a closing delimiter. There isn't one, so call READ again. Oops. So, to do it right you need to know what is coming next in much more detail. There are probably lots of other cases.
(I confess that I found this by just making #\( call my R-D-F function in the default readtable and trying to compile a fairly large program...)
* Tim Bradshaw | Here's one thing that is very hard to do.
Tim, this is a really good time for you go read the standard on the reader algorithm. I cannot fathom why you want to solve this any other way.
| Consider the case where you are using string-and-glue R-D-F to read | conventional (...) syntax. Consider this: | | (x #+(or) dont:read) | | Immediately after reading x, you check for a closing delimiter. There | isn't one, so call READ again. Oops.
What is the "oops" here? `read´ returns zero values in this case, and this is really standard behavior. The proposed #; reader maco would do precisely this, and end with `(values)´, and the code I posted here previously did. In fact, the standard ; reader macro scans until the end of the line and returns zero values.
| So, to do it right you need to know what is coming next in much more detail.
Sorry, this is still all wrong.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
* Erik Naggum wrote: > What is the "oops" here? `read´ returns zero values in this case, > and this is really standard behavior. The proposed #; reader maco > would do precisely this, and end with `(values)´, and the code I > posted here previously did. In fact, the standard ; reader macro > scans until the end of the line and returns zero values.
It does? I can find no mention of a case where READ returns zero values in the entry on it in the spec. Do you mean that the reader macro function should return zero values? I know that, but I'm not using that, I'm calling READ itself.
* Erik Naggum wrote: > Tim, this is a really good time for you go read the standard on > the reader algorithm. I cannot fathom why you want to solve this > any other way.
Incidentally: I *don't* want to solve it any other way. What I was trying to show was that the string and glue trick of looking-for-a-consing-dot-or-a-delimiter and if not found just calling READ *won't work* and pretty much *can't work* unless you start teaching it about (at the very least) #+ and #-, or actually essentially reimplementing the whole reader. So what one has to do instead is bite the bullet and do the whole algorithm, not try and make a string and glue solution.