Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

backslashes in strings

11 views
Skip to first unread message

Kent M Pitman

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
Sam Steingold <s...@goems.com> writes:

> Is there away to make CL interpret `\n' in a string as an embedded
> newline?

(set-macro-character #\" ...)

ey15

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
In article <m390bou...@eho.eaglets.com>, s...@goems.com says...

>
>Is there away to make CL interpret `\n' in a string as an embedded
>newline?
>IIUC, this would break 2.1.4.6.1: (eq 'nbc '\nbc).
>Well, I can live with it (I can turn if off by
>(setq *readtable* (copy-readtable)) at any time).
>
>Unfortunately, the obvious trick:
> (set-macro-character #\\ (lambda (stream char)
> (setq char (read-char stream t nil t))
> (case char (#\n #\Newline) (t char)))
> nil)
>does no good:
>
>USER(2): (print "aaa\nccc")
>
>"aaa\\nccc"
>"aaa\\nccc"
>
>while I want it to print
>"aaa
>ccc"
>
>Any suggestions?

I had a quick go at what you wanted. This works for your example,
but probably needs some extra work.


(set-macro-character #\" #'(lambda (stream char)
(declare (ignore char))
(LOOP WITH char-to-add
FOR this-char = (read-char stream t nil t)
UNTIL (CHAR= this-char \")
IF (AND (CHAR= this-char #\\) (CHAR= (PEEK-CHAR nil stream) \n))
DO (SETQ char-to-add #\Newline)
ELSE
DO (SETQ char-to-add this-char)
COLLECTING char-to-add INTO list-of-characters
FINALLY RETURN (COERCE list-of-characters 'STRING))))

BTW, you didn't put a #' before the lambda.

I extended the double quote character. It's probably not
that efficient, but who cares, it seams to work :-).

CL-USER 55 > "aaa\nccc"""
"aaa
nccc"


CL-USER 58 > "aaa\bnccc"
"aaa\\bnccc"


Stig Hemmer

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
Sam Steingold <s...@goems.com> writes:
> Is there away to make CL interpret `\n' in a string as an embedded
> newline?
> IIUC, this would break 2.1.4.6.1: (eq 'nbc '\nbc).

No it wouldn't. You were talking about \n _in a string_, which is
something different from \n in a symbol.

[... *read-table* ... set-macro-character ...]

You are on the wrong track here. This is not a matter of read-tables
as we are talking about _strings_ here. Different beast entirely.
(Well, almost entirely)

As Kent Pitman suggested, the way to go is to change the
interpretation of double-quote.

Your suggested syntax complicates the current meaning of \ in a
string. While I see the need, I'm not all that sure this is a good
thing.

Quick poll: Does anybody know of existing code that will break if an
implementation READs \n and friends in strings like C does?

Stig Hemmer,
Jack of a Few Trades.

Erik Naggum

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
* Stig Hemmer <st...@pvv.ntnu.no>

| Quick poll: Does anybody know of existing code that will break if an
| implementation READs \n and friends in strings like C does?

quick poll: is anyone willing to bet real money that nothing will break
if an implementation does this across the board? :)

*READTABLE* is bindable. there's no need to poll anyone.

#:Erik

Barry Margolin

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
In article <31336072...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
>* Stig Hemmer <st...@pvv.ntnu.no>
>| Quick poll: Does anybody know of existing code that will break if an
>| implementation READs \n and friends in strings like C does?
>
> quick poll: is anyone willing to bet real money that nothing will break
> if an implementation does this across the board? :)

I wouldn't guarantee that *nothing* will break, but I'm willing to wager
that it would be extremely rare. A bare \n should never show up in a
printed representation produced by Lisp (if a string actually contains \
followed by n, it will print as "\\n"), and I can't imagine why someone
would type "\n" intentionally, given that it's equivalent to just typing
"n".

> *READTABLE* is bindable. there's no need to poll anyone.

If the user wants to be able to type \n interactively, he'll need to assign
*READTABLE* rather than just bind it around LOAD or COMPILE-FILE.

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Erik Naggum

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
* Barry Margolin <bar...@bbnplanet.com>

| I wouldn't guarantee that *nothing* will break, but I'm willing to wager
| that it would be extremely rare. A bare \n should never show up in a
| printed representation produced by Lisp (if a string actually contains \
| followed by n, it will print as "\\n"), and I can't imagine why someone
| would type "\n" intentionally, given that it's equivalent to just typing
| "n".

for Lisp programmers and Lisp data, your analysis is correct. however,
would a Lisp programmer seriously want to pollute his code with \n and
its ilk? (that's rhetorical, I know the answer is Emacs Lisp. :) the
problem is therefore not so much that it would not occur in regular Lisp,
as it is that expectations get distorted once this is made available.
(e.g., confusions between Emacs Lisp and others Lisps.)

in particular, if \n is used as an embedded newline in some strings, it
will cause parsing to fail at random when the readtable is standard. no
matter how you nuke your own value of *READTABLE* or its contents, there
will be a time when this will not be what the data expects it to be,
unless, of course, *READTABLE* is bound with the data so read. which was
my point.

| If the user wants to be able to type \n interactively, he'll need to
| assign *READTABLE* rather than just bind it around LOAD or COMPILE-FILE.

well, I do this with @ for time objects and #" for symbol-names, but
these do not conflict with the standard reader macros, and it's fairly
obvious when things break. reading "foo\nbar" as "foonbar" is a very
quiet error.

#:Erik

Kent M Pitman

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
Barry Margolin <bar...@bbnplanet.com> writes:

> I wouldn't guarantee that *nothing* will break, but I'm willing to wager
> that it would be extremely rare.

Though the same quotation rule in symbols would break a lot of things, since
some lisp systems use \a\b\c instead of |abc|. Having \m\n\o and |mno| and
"\m\n\o" not share a similar rule of backslashing would be awful.

Kent M Pitman

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
Erik Naggum <er...@naggum.no> writes:

> * Stig Hemmer <st...@pvv.ntnu.no>
> | Quick poll: Does anybody know of existing code that will break if an
> | implementation READs \n and friends in strings like C does?
>
> quick poll: is anyone willing to bet real money that nothing will break
> if an implementation does this across the board? :)

I bet I have seen real code that will break.

Also, this COMPLETELY breaks the clean conceptual model Lisp has of what
backslash escaping does, which C is completely and permanently broken on.
I can think of few other single-character changes you can make to Lisp
that I would consider would break it so completely. Certainly I would
object much LESS violently if you changed "(" and ")" to "<" and ">",
or if you changed "\" to "/" and vice versa, since those would be only
changes in "choice of character" not in "philosophy of quotation".

Kent M Pitman

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
Erik Naggum <er...@naggum.no> writes:

> reading "foo\nbar" as "foonbar" is a very quiet error.

ASSUMING it is in fact an error at all. It HAS a semantics.
You assume that semantics is not used, and this is an error.
If it is not an error then it is a very quiet success, as should
be all successes that contain no I/O statements...

Note that the price of changing this implicitly is that you can't use
\ to escape an arbitrary character that you aren't sure of. Instead,
you have to also change \000 to mean read an octal code (or \xNN for
hex if you prefer) because the simple rule in the printer that says
that if the character has funny syntax it needs a slash before it will
no longer work. And that, in turn, makes it uglier to ever set the syntax
of anything to a strange syntax because it doesn't just cause
font
to print as
fo\nt
when the read syntax for n is unknown but instead causes it to print as
fo\x6e
which you can't even read any more with the human eye. That's a big
price to pay. Of course, you can say symbols and strings don't have to
have the same escape syntax, but that's a big price to pay, too.

I repeat: Languages are ecologies. You cannot kill the mosquitos for your
personal comfort and expect not to kill whatever feeds on them. In the
end, you may see odd effects you do not mean to happen because even the
inconveniences may be there for a reason.

Erik Naggum

unread,
Apr 20, 1999, 3:00:00 AM4/20/99
to
* Erik Naggum <er...@naggum.no>

| reading "foo\nbar" as "foonbar" is a very quiet error.

* Kent M Pitman <pit...@world.std.com>


| ASSUMING it is in fact an error at all. It HAS a semantics. You assume
| that semantics is not used, and this is an error.

I think you misread me here. assuming that a programmer decides to use a
C-style backslashing convention and sets up his system so that it prints
and is assumed to read such strings, "foo\nbar" is evidence of the intent
to store a string whose fourth character when read back is a newline.
given this assumption, it is a very _quiet_ error to return a string
whose fourth character is the letter n. and this is precisely what will
happen if the string is read back with standard syntax.

the whole point of my example was to show that while the changes to the
readtable that I have made (destructively) in my system will result in
loud errors when parsed with standard syntax, there is, as you point out,
_standard_ semantics for the choice of string syntax that produced the
string.

| I repeat: Languages are ecologies.

thanks for repeating this for our new viewers, but I think I have that
truth down pat.

#:Erik

Kent M Pitman

unread,
Apr 21, 1999, 3:00:00 AM4/21/99
to
Erik Naggum <er...@naggum.no> writes:

> the whole point of my example was to show that while the changes to the
> readtable that I have made (destructively) in my system will result in
> loud errors when parsed with standard syntax, there is, as you point out,
> _standard_ semantics for the choice of string syntax that produced the
> string.

I wasn't confused on this point, but mostly since you're a "sophisticated
user" I was talking past you (since I figured you knew) to the space of
people who might read what you wrote and think it was a "no brainer" to
change over. I agree that the "loud/quiet" thing is double-edged because
people who don't know the language rules might be lulled into doing the wrong
thing. Part of me does have sympathy in spite of the "caveat emptor" rule
that I think should dominate. Probably a friendly environment could
offer a warning. This is a good case for a non-fatal error, btw. I
could imagine the reader doing a
(signal 'questionable-escaped-char :subchar #\n)
which would "quietly" return NIL if unhandled, but which the interactive
reader could have something in it to do
(handler-bind ((questionable-escaped-char
#'(lambda (condition)
(window-system:display-visibly-but-nonfatally-in-another-window (princ-to-string condition))
nil))) ;lie and say condition not handled
(read))
but nothing would get messed up in the running computation.


> | I repeat: Languages are ecologies.
>
> thanks for repeating this for our new viewers, but I think I have that
> truth down pat.

Again, mostly to those looking over your shoulder. I doubt barmar
doesn't know this either. But neither of you was mentioning the
linguistic consistency argument, and I wanted a place to squeeze that in.

0 new messages