Read table modification question.

41 views
Skip to first unread message

Daniel Pittman

unread,
Jun 19, 2001, 9:12:58 AM6/19/01
to
I am using CMU Common Lisp (2.5.2) to write (well, prototype) a little
language parser/compiler.

As part of this, I would like to use the Lisp reader to read in the
contents of the input files. This input is similar to:

,----
| # '#' is a comment character, to the end of the line.
| command arg1 arg2 arg3;
| command arg1 { opt1 opt2 } arg2;
`----

So, to get the right `read' behavior, I need to create a modified
readtable with the syntax of #\# and #\; (at least) changed.[1]

Is there any way to do this other than to use `set-syntax-from-char'?
Reading the HyperSpec, it seems that there are many functions for
modifying the reader macro dispatch function associated with a
character, but no way to actually set the syntax type of a character.

So, can anyone offer guidance here? Am I taking this the wrong way --
should I be avoiding the Lisp reader in deference to something else?

I would like to keep the syntax of the input, of course, as that's half
the point of the prototype. :)

Daniel

Footnotes:
[1] Correct me, of course, if I have started out the wrong way with
this.

--
Life is not all lovely thorns and singing vultures, you know...
-- Morticia Addams, _The Addams Family_

Kent M Pitman

unread,
Jun 19, 2001, 11:44:42 AM6/19/01
to
Daniel Pittman <dan...@rimspace.net> writes:

>
> I am using CMU Common Lisp (2.5.2) to write (well, prototype) a little
> language parser/compiler.
>
> As part of this, I would like to use the Lisp reader to read in the
> contents of the input files. This input is similar to:
>
> ,----
> | # '#' is a comment character, to the end of the line.
> | command arg1 arg2 arg3;
> | command arg1 { opt1 opt2 } arg2;
> `----
>
> So, to get the right `read' behavior, I need to create a modified
> readtable with the syntax of #\# and #\; (at least) changed.[1]
>
> Is there any way to do this other than to use `set-syntax-from-char'?

I don't understand. Are you trying to work around a bug or did you not
read the two entries from CLHS on set-syntax-from-char and set-macro-character,
both of which offer you exampls of making how to make a comment work?
How many ways do you need to do this?

As to modifying #\;, you don't say what you want it changed to.
If you need to see it as a token, I recommend just
(defvar *end-of-command* (list '*end-of-command*))
(set-macro-character #\; #'(lambda (stream char)
(declare (ignore char))
*end-of-command*)
*my-readtable*)

> Reading the HyperSpec, it seems that there are many functions for
> modifying the reader macro dispatch function associated with a
> character, but no way to actually set the syntax type of a character.

This is right. It's implementation-dependent what those bits even are.



> So, can anyone offer guidance here? Am I taking this the wrong way --

If you re-read your message, you'll see you didn't show any examples of
what way you were doing it.

> should I be avoiding the Lisp reader in deference to something else?

That depends on how effectively you are making use of the readtable.
Personally, I wouldn't use the readtable for anything but Lisp forms, for
which it is designed, but I'm sure Erik Naggum will say I'm losing out
for not using it for more. It's just a personal preference thing.



> I would like to keep the syntax of the input, of course, as that's half
> the point of the prototype. :)

If it were me, I'd just write the command parser from scratch myself, using
read-char and peek-char. It's not that hard. But that's just a personal
call. But it lets you completely control what aspects of the parsed form
you retain and can display later.

martti....@solibri.com

unread,
Jun 19, 2001, 12:20:02 PM6/19/01
to
Daniel Pittman <dan...@rimspace.net> writes:

> I am using CMU Common Lisp (2.5.2) to write (well, prototype) a little
> language parser/compiler.
>
> As part of this, I would like to use the Lisp reader to read in the
> contents of the input files. This input is similar to:
>
> ,----
> | # '#' is a comment character, to the end of the line.
> | command arg1 arg2 arg3;
> | command arg1 { opt1 opt2 } arg2;
> `----
>
> So, to get the right `read' behavior, I need to create a modified
> readtable with the syntax of #\# and #\; (at least) changed.[1]


A possibly simpler brute-force way of doing this would be to run the
data through a filter which would exchange the troublesome characters to
some other character (if your language syntax has left something free to
use). Either externally, for example using the Unix tr -command, or
internally in lisp.

--

Steven D. Majewski

unread,
Jun 19, 2001, 2:00:38 PM6/19/01
to
In article <87vglsf...@inanna.rimspace.net>,

Daniel Pittman <dan...@rimspace.net> wrote:
>
>As part of this, I would like to use the Lisp reader to read in the
>contents of the input files. This input is similar to:
>
>,----
>| # '#' is a comment character, to the end of the line.
>| command arg1 arg2 arg3;
>| command arg1 { opt1 opt2 } arg2;
>`----
>
>So, to get the right `read' behavior, I need to create a modified
>readtable with the syntax of #\# and #\; (at least) changed.[1]
>
>Is there any way to do this other than to use `set-syntax-from-char'?
>Reading the HyperSpec, it seems that there are many functions for
>modifying the reader macro dispatch function associated with a
>character, but no way to actually set the syntax type of a character.

Is there some reason you don't want to use 'set-syntax-from-char' ?
( There are lower level routines you can use, but that would seem
the appropriate method here. )

I use (*):
( set-syntax-from-char #\, #\ )
( set-syntax-from-char #\# #\; )

in a function to read in numerical data files --
the first makes commas read as whitespace so it will accept
comma delimited lists,
the second makes '#' read as a comment char -- so I can read
in files from unix programs that use that convention.

As long as they are data files and not generic lisp files that
might use character literals, it works. ( In my compiled code,
the order of those two lines didn't seem to matter, but if you
type it in the terminal, it's likely that the second line has
to come last! ;-)

Just be sure to wrap it all in an UNWIND-PROTECT block that resets
the original readtable (with COPY-READTABLE) whether it exits normally
or not.

>So, can anyone offer guidance here? Am I taking this the wrong way --
>should I be avoiding the Lisp reader in deference to something else?

Why do you thing the reader would be so programmable if you weren't
supposed to use it?

-- Steve Majewski <sd...@Virginia.EDU>

(* "I use..." :
Actually, this was in XlispStat, not Common Lisp, and until I
added an implementation of #'set-syntax-from-char, I had to
hack a lower level method, but I plan to rewrite it using
#'set-syntax-from-char. In XlispStat, the readtables happen
to be implemented by vectors, so the actual code used was
more like:
(setf (elt *readtable* ( char-code #\# )) ...
*)

Erik Naggum

unread,
Jun 19, 2001, 5:19:53 PM6/19/01
to
* Daniel Pittman <dan...@rimspace.net>

> I am using CMU Common Lisp (2.5.2) to write (well, prototype) a little
> language parser/compiler.

If that language has Lisp nature, it is a good idea to use the Lisp
reader. If it does not have the Lisp nature, it is a very, very bad
novice mistake to use the Lisp reader. In general, few syntaxes have the
Lisp nature. The primary criterion is that the first character (possibly
the first two) should determine the type of the object and the method of
converting an character stream (text) representation into an in-memory
representation (object). The exceptions are symbols, which are whatever
is left after a sequence of characters not otherwise startings an object
is determined not to be a number. This rule is part of the Lisp nature,
and it is _not_ part of most other syntaxes.

> I would like to keep the syntax of the input, of course, as that's half
> the point of the prototype. :)

If you care to know my opinion, I think semicolon-and-braces-oriented
syntaxes suck and that it is a very, very bad idea to use them at all.
It is far easier to write a parser for a syntax with the Lisp nature in
any language than it is to write a parser for thet stupid semiconcoction.
Whoever decided to use the semicolon to _end_ something should just be
taken out and have his colon semified. (At least COBOL and SQL managed
to use a period.)

#:Erik
--
Travel is a meat thing.

Daniel Pittman

unread,
Jun 20, 2001, 5:13:50 AM6/20/01
to
On Tue, 19 Jun 2001, Kent M. Pitman wrote:
> Daniel Pittman <dan...@rimspace.net> writes:

[...]

>> As part of this, I would like to use the Lisp reader to read in the
>> contents of the input files. This input is similar to:

[...]

>> So, to get the right `read' behavior, I need to create a modified
>> readtable with the syntax of #\# and #\; (at least) changed.[1]
>>
>> Is there any way to do this other than to use `set-syntax-from-char'?
>
> I don't understand. Are you trying to work around a bug or did you not
> read the two entries from CLHS on set-syntax-from-char and
> set-macro-character, both of which offer you exampls of making how to
> make a comment work?

Er, neither?

> How many ways do you need to do this?

I wasn't sure that `set-syntax-from-char' (which was the one that I was
after) was the *right* way to do what I wanted.

[...]

>> So, can anyone offer guidance here? Am I taking this the wrong way --
>
> If you re-read your message, you'll see you didn't show any examples
> of what way you were doing it.

Sorry, I was after more general guidance, rather than specific "is this
code correct" guidance. An "is this the Lisp way to do it" question.

>> should I be avoiding the Lisp reader in deference to something else?
>
> That depends on how effectively you are making use of the readtable.
> Personally, I wouldn't use the readtable for anything but Lisp forms,
> for which it is designed, but I'm sure Erik Naggum will say I'm losing
> out for not using it for more. It's just a personal preference thing.

Erik's comments were very helpful to me, specifically that I was
probably wasting time trying to shoehorn a non-Lisp language through the
Lisp reader.

>> I would like to keep the syntax of the input, of course, as that's
>> half the point of the prototype. :)
>
> If it were me, I'd just write the command parser from scratch myself,
> using read-char and peek-char. It's not that hard. But that's just a
> personal call. But it lets you completely control what aspects of the
> parsed form you retain and can display later.

Cool. That's pretty much what I intend now. Sorry the question was so
vague -- and thanks for helping anyway. :)

Daniel

--
Democracy is ever eager for rapid progress, and the only
progress which can be rapid is progress made down hill.
-- Sir James Jeans

Daniel Pittman

unread,
Jun 20, 2001, 5:08:08 AM6/20/01
to
On Tue, 19 Jun 2001, Erik Naggum wrote:
> * Daniel Pittman <dan...@rimspace.net>
>> I am using CMU Common Lisp (2.5.2) to write (well, prototype) a
>> little language parser/compiler.
>
> If that language has Lisp nature, it is a good idea to use the Lisp
> reader. If it does not have the Lisp nature, it is a very, very bad
> novice mistake to use the Lisp reader.

Right. That makes sense. It goes a lot of the way to explaining why I
felt that it was a fight against the reader, not working with it.

I was mislead by the `use the reader as a tokenizer' comments elsewhere
in the HyperSpec, I suspect. Er, that and being too lazy to write my
own.

Thanks,
Daniel

--
Forsan et haec olim meminisse juvabit.
Some day, perhaps, even this will be pleasant to remember.
-- Vergil, Aeneid

Reply all
Reply to author
Forward
0 new messages