Unrecognized escape sequences in string literals

Douglas Alan

unread,

Aug 9, 2009, 3:26:54 PM8/9/09

to

A friend of mine is just learning Python, and he's a bit tweaked about
how unrecognized escape sequences are treated in Python. This is from
the Python 3.0 reference manual:

Unlike Standard C, all unrecognized escape sequences are left in
the string unchanged, i.e.,
the backslash is left in the string. (This behavior is useful
when debugging: if an escape
sequence is mistyped, the resulting output is more easily
recognized as broken.) It is also
important to note that the escape sequences only recognized in
string literals fall into the
category of unrecognized escapes for bytes literals.

My friend begs to differ with the above. It would be much better for
debugging if Python generated a parsing error for unrecognized escape
sequences, rather than leaving them unchanged. g++ outputs a warning
for such escape sequences, for instance. This is what I would consider
to be the correct behavior. (Actually, I think it should just generate
a fatal parsing error, but a warning is okay too.)

In any case, I think my friend should mellow out a bit, but we both
consider this something of a wart. He's just more wart-phobic than I
am. Is there any way that this behavior can be considered anything
other than a wart? Other than the unconvincing claim that you can use
this "feature" to save you a bit of typing sometimes when you actually
want a backslash to be in your string?

|>ouglas

Steven D'Aprano

unread,

Aug 9, 2009, 8:06:18 PM8/9/09

to

On Sun, 09 Aug 2009 12:26:54 -0700, Douglas Alan wrote:

> A friend of mine is just learning Python, and he's a bit tweaked about
> how unrecognized escape sequences are treated in Python.

...

> In any case, I think my friend should mellow out a bit, but we both
> consider this something of a wart. He's just more wart-phobic than I am.
> Is there any way that this behavior can be considered anything other
> than a wart? Other than the unconvincing claim that you can use this
> "feature" to save you a bit of typing sometimes when you actually want a
> backslash to be in your string?

I'd put it this way: a backslash is just an ordinary character, except
when it needs to be special. So Python's behaviour is "treat backslash as
a normal character, except for these exceptions" while the behaviour your
friend wants is "treat a backslash as an error, except for these
exceptions".

Why should a backslash in a string literal be an error?

--
Steven

Douglas Alan

unread,

Aug 9, 2009, 8:56:55 PM8/9/09

to

Steven D'Aprano wrote:

> Why should a backslash in a string literal be an error?

Because in Python, if my friend sees the string "foo\xbar\n", he has
no idea whether the "\x" is an escape sequence, or if it is just the
characters "\x", unless he looks it up in the manual, or tries it out
in the REPL, or what have you. My friend is adamant that it would be
better if he could just look at the string literal and know. He
doesn't want to be bothered to have to store stuff like that in his
head. He wants to be able to figure out programs just by looking at
them, to the maximum degree that that is feasible.

In comparison to Python, in C++, he can just look "foo\xbar\n" and
know that "\x" is a special character. (As long as it compiles without
warnings under g++.)

He's particularly annoyed too, that if he types "foo\xbar" at the
REPL, it echoes back as "foo\\xbar". He finds that to be some sort of
annoying DWIM feature, and if Python is going to have DWIM features,
then it should, for example, figure out what he means by "\" and not
bother him with a syntax error in that case.

Another reason that Python should not behave the way that it does, is
that it pegs Python into a corner where it can't add new escape
sequences in the future, as doing so will break existing code.
Generating a syntax error instead for unknown escape sequences would
allow for future extensions.

Now not to pick on Python unfairly, most other languages have similar
issues with escape sequences. (Except for the Bourne Shell and bash,
where "\x" always just means "x", no matter what character "x" happens
to be.) But I've been telling my friend for years to switch to Python
because of how wonderful and consistent Python is in comparison to
most other languages, and now he seems disappointed and seems to think
that Python is just more of the same.

Of course I think that he's overreacting a bit. My point of view is
that every language has *some* warts; Python just has a bit fewer than
most. It would have been nice, I should think, if this wart had been
"fixed" in Python 3, as I do consider it to be a minor wart.

|>ouglas

Carl Banks

unread,

Aug 9, 2009, 9:34:14 PM8/9/09

to

On Aug 9, 5:06 pm, Steven D'Aprano <st...@REMOVE-THIS-

Because the behavior of \ in a string is context-dependent, which
means a reader can't know if \ is a literal character or escape
character without knowing the context, and it means an innocuous
change in context can cause a rather significant change in \.

IOW it's an error-prone mess. It would be better if Python (like C)
treated \ consistently as an escape character. (And in raw strings,
consistently as a literal.)

It's kind of a minor issue in terms of overall real-world importance,
but in terms of raw unPythonicness this might be the worst offense the
language makes.

Carl Banks

Douglas Alan

unread,

Aug 9, 2009, 10:42:56 PM8/9/09

to

On Aug 9, 8:06 pm, Steven D'Aprano wrote:

> while the behaviour your
> friend wants is "treat a backslash as an error, except for these
> exceptions".

Besides, can't all error situations be described as, "treat the error
situation as an error, except for the exception of when the situation
isn't an error"???

The behavior my friend wants isn't any more exceptional than that!

|>ouglas

John Nagle

unread,

Aug 10, 2009, 2:03:14 AM8/10/09

to

Carl Banks wrote:
> IOW it's an error-prone mess. It would be better if Python (like C)
> treated \ consistently as an escape character. (And in raw strings,
> consistently as a literal.)

Agreed. For one thing, if another escape character ever has to be
added to the language, that may change the semantics of previously
correct strings. If "\" followed by a non-special character is treated
as an error, that doesn't happen.

John Nagle

Steven D'Aprano

unread,

Aug 10, 2009, 2:03:05 AM8/10/09

to

On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:

> Steven D'Aprano wrote:
>
>> Why should a backslash in a string literal be an error?
>
> Because in Python, if my friend sees the string "foo\xbar\n", he has no
> idea whether the "\x" is an escape sequence, or if it is just the
> characters "\x", unless he looks it up in the manual, or tries it out in
> the REPL, or what have you.

Fair enough, but isn't that just another way of saying that if you look
at a piece of code and don't know what it does, you don't know what it
does unless you look it up or try it out?

> My friend is adamant that it would be better
> if he could just look at the string literal and know. He doesn't want to
> be bothered to have to store stuff like that in his head. He wants to be
> able to figure out programs just by looking at them, to the maximum
> degree that that is feasible.

I actually sympathize strongly with that attitude. But, honestly, your
friend is a programmer (or at least pretends to be one *wink*). You can't
be a programmer without memorizing stuff: syntax, function calls, modules
to import, quoting rules, blah blah blah. Take C as an example -- there's
absolutely nothing about () that says "group expressions or call a
function" and {} that says "group a code block". You just have to
memorize it. If you don't know what a backslash escape is going to do,
why would you use it? I'm sure your friend isn't in the habit of randomly
adding backslashes to strings just to see whether it will still compile.

This is especially important when reading (as opposed to writing) code.
You read somebody else's code, and see "foo\xbar\n". Let's say you know
it compiles without warning. Big deal -- you don't know what the escape
codes do unless you've memorized them. What does \n resolve to? chr(13)
or chr(97) or chr(0)? Who knows?

Unless you know the rules, you have no idea what is in the string.
Allowing \y to resolve to a literal backslash followed by y doesn't
change that. All it means is that some \c combinations return a single
character, and some return two.

> In comparison to Python, in C++, he can just look "foo\xbar\n" and know
> that "\x" is a special character. (As long as it compiles without
> warnings under g++.)

So what you mean is, he can just look at "foo\xbar\n" AND COMPILE IT
USING g++, and know whether or not \x is a special character.

[sarcasm] Gosh. That's an enormous difference from Python, where you have
to print the string at the REPL to know what it does. [/sarcasm]

Aside:
\x isn't a special character:

>>> "\x"
ValueError: invalid \x escape

However, \xba is:

>>> "\xba"
'\xba'
>>> len("\xba")
1
>>> ord("\xba")
186

> He's particularly annoyed too, that if he types "foo\xbar" at the REPL,
> it echoes back as "foo\\xbar". He finds that to be some sort of annoying
> DWIM feature, and if Python is going to have DWIM features, then it
> should, for example, figure out what he means by "\" and not bother him
> with a syntax error in that case.

Now your friend is confused. This is a good thing. Any backslash you see
in Python's default string output is *always* an escape:

>>> "a string with a 'proper' escape \t (tab)"
"a string with a 'proper' escape \t (tab)"
>>> "a string with an 'improper' escape \y (backslash-y)"
"a string with an 'improper' escape \\y (backslash-y)"

The REPL is actually doing him a favour. It always escapes backslashes,
so there is no ambiguity. A backslash is displayed as \\, any other \c is
a special character.

> Of course I think that he's overreacting a bit.

:)

> My point of view is that
> every language has *some* warts; Python just has a bit fewer than most.
> It would have been nice, I should think, if this wart had been "fixed"
> in Python 3, as I do consider it to be a minor wart.

And if anyone had cared enough to raise it a couple of years back, it
possibly might have been.

--
Steven

Steven D'Aprano

unread,

Aug 10, 2009, 2:10:53 AM8/10/09

to

On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote:

>> Why should a backslash in a string literal be an error?
>
> Because the behavior of \ in a string is context-dependent, which means
> a reader can't know if \ is a literal character or escape character
> without knowing the context, and it means an innocuous change in context
> can cause a rather significant change in \.

*Any* change in context is significant with escapes.

"this \nhas two lines"

If you change the \n to a \t you get a significant difference. If you
change the \n to a \y you get a significant difference. Why is the first
one acceptable but the second not?

> IOW it's an error-prone mess.

I've never had any errors caused by this. I've never seen anyone write to
this newsgroup confused over escape behaviour, or asking for help with an
error caused by it, and until this thread, never seen anyone complain
about it either.

Excuse my cynicism, but I believe that you are using "error-prone" to
mean "I don't like this behaviour" rather than "it causes lots of errors".

--
Steven

Steven D'Aprano

unread,

Aug 10, 2009, 2:23:20 AM8/10/09

to

On Sun, 09 Aug 2009 23:03:14 -0700, John Nagle wrote:

> if another escape character ever has to be
> added to the language, that may change the semantics of previously
> correct strings.

And that's the only argument in favour of prohibiting non-special
backslash sequences I've seen yet that is even close to convincing.

--
Steven

Douglas Alan

unread,

Aug 10, 2009, 3:32:30 AM8/10/09

to

On Aug 10, 2:03 am, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:

> On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:

> > Because in Python, if my friend sees the string "foo\xbar\n", he has no
> > idea whether the "\x" is an escape sequence, or if it is just the
> > characters "\x", unless he looks it up in the manual, or tries it out in
> > the REPL, or what have you.
>
> Fair enough, but isn't that just another way of saying that if you look
> at a piece of code and don't know what it does, you don't know what it
> does unless you look it up or try it out?

Not really. It's more like saying that easy things should be easy, and
hard things should possible. But in this case, Python is making
something that should be really easy, a bit harder and more error
prone than it should be.

In C++, if I know that the code I'm looking at compiles, then I never
need worry that I've misinterpreted what a string literal means. At
least not if it doesn't have any escape characters in it that I'm not
familiar with. But in Python, if I see, "\f\o\o\b\a\z", I'm not really
sure what I'm seeing, as I surely don't have committed to memory some
of the more obscure escape sequences. If I saw this in C++, and I knew
that it was in code that compiled, then I'd at least know that there
are some strange escape codes that I have to look up. Unlike with
Python, it would never be the case in C++ code that the programmer who
wrote the code was just too lazy to type in "\\f\\o\\o\\b\\a\\z"
instead.

> > My friend is adamant that it would be better
> > if he could just look at the string literal and know. He doesn't want to
> > be bothered to have to store stuff like that in his head. He wants to be
> > able to figure out programs just by looking at them, to the maximum
> > degree that that is feasible.
>
> I actually sympathize strongly with that attitude. But, honestly, your
> friend is a programmer (or at least pretends to be one *wink*).

Actually, he's probably written more code than you, me, and ten other
random decent programmers put together. As he can slap out massive
amounts of code very quickly, he'd prefer not to have crap getting in
his way. In the time it takes him to look something up, he might have
written another page of code.

He's perfectly capable of dealing with crap, as years of writing large
programs in Perl and PHP quickly proves, but his whole reason for
learning Python, I take it, is so that he will be bothered with less
crap and therefore write code even faster.

> You can't be a programmer without memorizing stuff: syntax, function
> calls, modules to import, quoting rules, blah blah blah. Take C as
> an example -- there's absolutely nothing about () that says "group
> expressions or call a function" and {} that says "group a code
> block".

I don't really think that this is a good analogy. It's like the
difference between remembering rules of grammar and remembering
English spelling. As a kid, I was the best in my school at grammar,
and one of the worst at speling.

> You just have to memorize it. If you don't know what a backslash
> escape is going to do, why would you use it?

(1) You're looking at code that someone else wrote, or (2) you forget
to type "\\" instead of "\" in your code (or get lazy sometimes), as
that is okay most of the time, and you inadvertently get a subtle bug.

> This is especially important when reading (as opposed to writing) code.
> You read somebody else's code, and see "foo\xbar\n". Let's say you know
> it compiles without warning. Big deal -- you don't know what the escape
> codes do unless you've memorized them. What does \n resolve to? chr(13)
> or chr(97) or chr(0)? Who knows?

It *is* a big deal. Or at least a non-trivial deal. It means that you
can tell just by looking at the code that there are funny characters
in the string, and not just a backslashes. You don't have to go
running for the manual every time you see code with backslashes, where
the upshot might be that the programmer was merely saving themselves
some typing.

> > In comparison to Python, in C++, he can just look "foo\xbar\n" and know
> > that "\x" is a special character. (As long as it compiles without
> > warnings under g++.)
>
> So what you mean is, he can just look at "foo\xbar\n" AND COMPILE IT
> USING g++, and know whether or not \x is a special character.

I'm not sure that your comments are paying due diligence to full
life-cycle software development issues that involve multiple
programmers (or even just your own program that you wrote a year ago,
and you don't remember all the details of what you did) combined with
maintaining and modifying existing code, etc.

> Aside:
> \x isn't a special character:
>
> >>> "\x"
>
> ValueError: invalid \x escape

I think that this all just goes to prove my friend's point! Here I've
been programming in Python for more than a decade (not full time, mind
you, as I also program in other languages, like C++), and even I
didn't know that "\xba" was an escape sequence, and I inadvertently
introduced a subtle bug into my argument because it just so happens
that the first two characters of "bar" are legal hexadecimal! If I did
the very same thing in a real program, it might take me a lot of time
to track down the bug.

Also, it seems that Python is being inconsistent here. Python knows
that
the string "\x" doesn't contain a full escape sequence, so why doesn't
it
treat the string "\x" the same way that it treats the string "\z"?
After all, if you're a Python programmer, you should know that "\x"
doesn't contain a complete escape sequence, and therefore, you would
not be surprised if Python were so kind as to just leave it alone,
rather than raising a ValueError.

I.e., "\z" is not a legal escape sequence, so it gets left as
"\\z". "\x" is not a legal escape sequence. Shouldn't it also get left
as "\\x"?

> > He's particularly annoyed too, that if he types "foo\xbar" at the REPL,
> > it echoes back as "foo\\xbar". He finds that to be some sort of annoying
> > DWIM feature, and if Python is going to have DWIM features, then it
> > should, for example, figure out what he means by "\" and not bother him
> > with a syntax error in that case.
>
> Now your friend is confused. This is a good thing. Any backslash you see
> in Python's default string output is *always* an escape:

Well, I think he's more annoyed that if Python is going to be so
helpful as to put in the missing "\" for you in "foo\zbar", then it
should put in the missing "\" for you in "\". He considers this to be
an
inconsistency.

Me, I'd never, ever, EVER want a language to special-case something at
the end of a string, but I can see that from his new-to-Python
perspective, Python seems to be DWIMing in one place and not the
other, and he thinks that it should either do no DWIMing at all, or
consistently DWIM. To not be consistent in this regard is "inelegant",
says he.

And I can see his point that allowing "foo\zbar" and "foo\\zbar" to be
synonymous is a form of DWIMing.

> > My point of view is that every language has *some* warts; Python
> > just has a bit fewer than most. It would have been nice, I should
> > think, if this wart had been "fixed" in Python 3, as I do consider
> > it to be a minor wart.

> And if anyone had cared enough to raise it a couple of years back, it
> possibly might have been.

So, now if only my friend had learned Python years ago, when I told
him to, he possibly might be happy with Python by now!

|>ouglas

Carl Banks

unread,

Aug 10, 2009, 3:37:33 AM8/10/09

to

On Aug 9, 11:10 pm, Steven D'Aprano

<ste...@REMOVE.THIS.cybersource.com.au> wrote:
> On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote:
> >> Why should a backslash in a string literal be an error?
>
> > Because the behavior of \ in a string is context-dependent, which means
> > a reader can't know if \ is a literal character or escape character
> > without knowing the context, and it means an innocuous change in context
> > can cause a rather significant change in \.
>
> *Any* change in context is significant with escapes.
>
> "this \nhas two lines"
>
> If you change the \n to a \t you get a significant difference. If you
> change the \n to a \y you get a significant difference. Why is the first
> one acceptable but the second not?

Because when you change \n to \t, you've haven't changed the meaning
of the \ character; but when you change \n to \y, you have, and you
did so without even touching the backslash.

> > IOW it's an error-prone mess.
>
> I've never had any errors caused by this.

Thank you for your anecdotal evidence. Here's mine: This has gotten
me at least twice, and a compiler complaint would have reduced my bug-
hunting time from tens of minutes to ones of seconds. [Aside: it was
when I was using Python on Windows for the first time]

> I've never seen anyone write to
> this newsgroup confused over escape behaviour, or asking for help with an
> error caused by it, and until this thread, never seen anyone complain
> about it either.

More anecdotal evidence. Here's mine: I have.

> Excuse my cynicism, but I believe that you are using "error-prone" to
> mean "I don't like this behaviour" rather than "it causes lots of errors".

No, I'm using error-prone to mean error-prone.

Someone (obviously not you because you're have perfect knowledge of
the language and 100% situation awareness at all times) might have a
string like "abcd\stuv" and change it to "abcd\tuvw" without even
thinking about the fact that the s comes after the backslash.

Worst of all: they might not even notice the error, because the repr
of this string is:

'abcd\tuwv'

They might not notice that the backslash is single, because (unlike
you) mortal fallible human beings don't always register tiny details
like a backslash being single when it should be double.

Point is, this is a very bad inconsistency. It makes the behavior of
\ impossible to learn by analogy, now you have to memorize a list of
situations where it behaves one way or another.

Carl Banks

Douglas Alan

unread,

Aug 10, 2009, 3:57:18 AM8/10/09

to

On Aug 10, 2:10 am, Steven D'Aprano

> I've never had any errors caused by this.

But you've seen an error caused by this, in this very discussion.
I.e., "foo\xbar".

"\xba" isn't an escape sequence in any other language that I've used,
which is one reason I made this error... Oh, wait a minute -- it *is*
an escape sequence in JavaScript. But in JavaScript, while "\xba" is a
special character, "\xb" is synonymous with "xb".

The fact that every language seems to treat these things similarly but
differently, is yet another reason why they should just be treated
utterly consistently by all of the languages: I.e., escape sequences
that don't have a special meaning should be an error!

> I've never seen anyone write to
> this newsgroup confused over escape behaviour,

My friend objects strongly the claim that he is "confused" by it, so I
guess you are right that no one is confused. He just thinks that it
violates the beautiful sense of aesthetics that he was sworn over and
over again Python to have.

But aesthetics is a non-negligible issue with practical ramifications.
(Not that anything can be done about this wart at this point,
however.)

> or asking for help with an error caused by it, and until
> this thread, never seen anyone complain about it either.

Oh, this bothered me too when I first learned Python, and I thought it
was stupid. It just didn't bother me enough to complain publicly.

Besides, the vast majority of Python noobs don't come here, despite
appearance sometimes, and by the time most people get here, they've
probably got bigger fish to fry.

|>ouglas

Steven D'Aprano

unread,

Aug 10, 2009, 4:37:48 AM8/10/09

to

On Mon, 10 Aug 2009 00:37:33 -0700, Carl Banks wrote:

> On Aug 9, 11:10 pm, Steven D'Aprano
> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>> On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote:
>> >> Why should a backslash in a string literal be an error?
>>
>> > Because the behavior of \ in a string is context-dependent, which
>> > means a reader can't know if \ is a literal character or escape
>> > character without knowing the context, and it means an innocuous
>> > change in context can cause a rather significant change in \.
>>
>> *Any* change in context is significant with escapes.
>>
>> "this \nhas two lines"
>>
>> If you change the \n to a \t you get a significant difference. If you
>> change the \n to a \y you get a significant difference. Why is the
>> first one acceptable but the second not?
>
> Because when you change \n to \t, you've haven't changed the meaning of
> the \ character;

I assume you mean the \ character in the literal, not the (non-existent)
\ character in the string.

> but when you change \n to \y, you have, and you did so
> without even touching the backslash.

Not at all.

'\n' maps to the string chr(10).
'\y' maps to the string chr(92) + chr(121).

In both cases the backslash in the literal have the same meaning: grab
the next token (usually a single character, but not always), look it up
in a mapping somewhere, and insert the result in the string object being
built.

(I don't know if the *implementation* is precisely as described, but
that's irrelevant. It's still functionally a mapping.)

>> > IOW it's an error-prone mess.
>>
>> I've never had any errors caused by this.
>
> Thank you for your anecdotal evidence. Here's mine: This has gotten me
> at least twice, and a compiler complaint would have reduced my bug-
> hunting time from tens of minutes to ones of seconds. [Aside: it was
> when I was using Python on Windows for the first time]

Okay, that's twice in, how many years have you been programming?

I've mistyped "xrange" as "xrnage" two or three times. Does that make
xrange() "an error-prone mess" too? Probably not. Why is my mistake my
mistake, but your mistake the language's fault?

[...]

Oh, wait, no, I tell I lie -- I *have* seen people reporting "bugs" here
caused by backslashes. They're invariably Windows programmers writing
pathnames using backslashes, so I'll give you that one: if you don't know
that Python treats backslashes as special in string literals, you will
screw up your Windows pathnames.

Interestingly, the problem there is not that \y resolves to literal
backslash followed by y, but that \t DOESN'T resolve to the expected
backslash-t. So it seems to me that the problem for Windows coders is not
that \y doesn't raise an error, but the mere existence of backslash
escapes.

> Someone (obviously not you because you're have perfect knowledge of the
> language and 100% situation awareness at all times) might have a string
> like "abcd\stuv" and change it to "abcd\tuvw" without even thinking
> about the fact that the s comes after the backslash.

Deary me. And they might type "4+15" instead of "4*51", and now
arithmetic is an "error-prone mess" too. If you know of a programming
language which can prevent you making semantic errors, please let us all
know what it is.

If you edit code without thinking, you will be burnt, and you get *zero*
sympathy from me.

> Worst of all: they might not even notice the error, because the repr of
> this string is:
>
> 'abcd\tuwv'
>
> They might not notice that the backslash is single, because (unlike you)
> mortal fallible human beings don't always register tiny details like a
> backslash being single when it should be double.

"Help help, 123145 looks too similar to 1231145, and now I calculated my
taxes wrong and will go to jail!!!"

> Point is, this is a very bad inconsistency. It makes the behavior of \
> impossible to learn by analogy, now you have to memorize a list of
> situations where it behaves one way or another.

No, you don't "have" to memorize anything, you can go right ahead and
escape every backslash, as I did for years. Your code will still work
fine.

You already have to memorize what escape codes return special characters.
The only difference is whether you learn "...and everything else raises
an exception" or "...and everything else is returned unchanged".

There is at least one good reason for preferring an error, namely that it
allows Python to introduce new escape codes without going through a long,
slow process. But the rest of these complaints are terribly unconvincing.

--
Steven

Steven D'Aprano

unread,

Aug 10, 2009, 4:49:20 AM8/10/09

to

On Mon, 10 Aug 2009 00:57:18 -0700, Douglas Alan wrote:

> On Aug 10, 2:10 am, Steven D'Aprano
>
>> I've never had any errors caused by this.
>
> But you've seen an error caused by this, in this very discussion. I.e.,
> "foo\xbar".

Your complaint is that "invalid" escapes like \y resolve to a literal
backslash-y instead of raising an error. But \xbar doesn't contain an
invalid escape, it contains a valid hex escape. Your ignorance that \xHH
is a valid hex escape (for suitable hex digits) isn't an example of an
error caused by "invalid" escapes like \y.

> "\xba" isn't an escape sequence in any other language that I've used,
> which is one reason I made this error... Oh, wait a minute -- it *is* an
> escape sequence in JavaScript. But in JavaScript, while "\xba" is a
> special character, "\xb" is synonymous with "xb".
>
> The fact that every language seems to treat these things similarly but
> differently, is yet another reason why they should just be treated
> utterly consistently by all of the languages: I.e., escape sequences
> that don't have a special meaning should be an error!

Perhaps all the other languages should follow Python's lead instead?

Or perhaps they should follow bash's lead, and map \C to C for every
character. If there were no special escapes at all, Windows programmers
wouldn't keep getting burnt when they write "C:\\Documents\today\foo" and
end up with something completely unexpected.

Oh wait, no, that still wouldn't work, because they'd end up with
C:\Documentstodayfoo. So copying bash doesn't work.

But copying C will upset the bash coders, because they'll write
"some\ file\ with\ spaces" and suddenly their code won't even compile!!!

Seems like no matter what you do, you're going to upset *somebody*.

>> I've never seen anyone write to
>> this newsgroup confused over escape behaviour,
>
> My friend objects strongly the claim that he is "confused" by it, so I
> guess you are right that no one is confused. He just thinks that it
> violates the beautiful sense of aesthetics that he was sworn over and
> over again Python to have.

Fair enough.

--
Steven

Duncan Booth

unread,

Aug 10, 2009, 5:34:38 AM8/10/09

to

Douglas Alan <darkw...@gmail.com> wrote:

> "\xba" isn't an escape sequence in any other language that I've used,
> which is one reason I made this error... Oh, wait a minute -- it *is*
> an escape sequence in JavaScript. But in JavaScript, while "\xba" is a
> special character, "\xb" is synonymous with "xb".
>

"\xba" is an escape sequence in c, c++, c#, python, javascript, perl and
probably many others.

"\xb" is an escape sequence in c, c++, c# but not in Python, Javascript, or
Perl. Python will throw ValueError if you try to use "\xb" in a string,
Javascript simply ignores the backslash.

> The fact that every language seems to treat these things similarly but
> differently, is yet another reason why they should just be treated
> utterly consistently by all of the languages: I.e., escape sequences
> that don't have a special meaning should be an error!

It would be nice if these things were treated consistently, but they aren't
and it seems unlikely to change.

--
Duncan Booth http://kupuguy.blogspot.com

Steven D'Aprano

unread,

Aug 10, 2009, 5:40:24 AM8/10/09

to

On Mon, 10 Aug 2009 00:32:30 -0700, Douglas Alan wrote:

> In C++, if I know that the code I'm looking at compiles, then I never
> need worry that I've misinterpreted what a string literal means.

If you don't know what your string literals are, you don't know what your
program does. You can't expect the compiler to save you from semantic
errors. Adding escape codes into the string literal doesn't change this
basic truth.

Semantics matters, and unlike syntax, the compiler can't check it.
There's a difference between a program that does the equivalent of:

os.system("cp myfile myfile~")

and one which does this

os.system("rm myfile myfile~")

The compiler can't save you from typing 1234 instead of 11234, or 31.45
instead of 3.145, or "My darling Ho" instead of "My darling Jo", so why
do you expect it to save you from typing "abc\d" instead of "abc\\d"?

Perhaps it can catch *some* errors of that type, but only at the cost of
extra effort required to defeat the compiler (forcing the programmer to
type \\d to prevent the compiler complaining about \d). I don't think the
benefit is worth the cost. You and your friend do. Who is to say you're
right?

> At
> least not if it doesn't have any escape characters in it that I'm not
> familiar with. But in Python, if I see, "\f\o\o\b\a\z", I'm not really
> sure what I'm seeing, as I surely don't have committed to memory some of
> the more obscure escape sequences. If I saw this in C++, and I knew that
> it was in code that compiled, then I'd at least know that there are some
> strange escape codes that I have to look up.

And if you saw that in Python, you'd also know that there are some
strange escape codes that you have to look up. Fortunately, in Python,
that's really simple:

>>> "\f\o\o\b\a\z"

'\x0c\\o\\o\x08\x07\\z'

Immediately you can see that the \o and \z sequences resolve to
themselves, and the \f \b and \a don't.

> Unlike with Python, it
> would never be the case in C++ code that the programmer who wrote the
> code was just too lazy to type in "\\f\\o\\o\\b\\a\\z" instead.

But if you see "abc\n", you can't be sure whether the lazy programmer
intended "abc"+newline, or "abc"+backslash+"n". Either way, the compiler
won't complain.

>> You just have to memorize it. If you don't know what a backslash escape
>> is going to do, why would you use it?
>
> (1) You're looking at code that someone else wrote, or (2) you forget to
> type "\\" instead of "\" in your code (or get lazy sometimes), as that
> is okay most of the time, and you inadvertently get a subtle bug.

The same error can occur in C++, if you intend \\n but type \n by
mistake. Or vice versa. The compiler won't save you from that.

>> This is especially important when reading (as opposed to writing) code.
>> You read somebody else's code, and see "foo\xbar\n". Let's say you know
>> it compiles without warning. Big deal -- you don't know what the escape
>> codes do unless you've memorized them. What does \n resolve to? chr(13)
>> or chr(97) or chr(0)? Who knows?
>
> It *is* a big deal. Or at least a non-trivial deal. It means that you
> can tell just by looking at the code that there are funny characters in
> the string, and not just a backslashes.

I'm not entirely sure why you think that's a big deal. Strictly speaking,
there are no "funny characters", not even \0, in Python. They're all just
characters. Perhaps the closest is newline (which is pretty obvious).

> You don't have to go running for
> the manual every time you see code with backslashes, where the upshot
> might be that the programmer was merely saving themselves some typing.

Why do you care if there are "funny characters"?

In C++, if you see an escape you don't recognize, do you care? Do you go
running for the manual? If the answer is No, then why do it in Python?

And if the answer is Yes, then how is Python worse than C++?

[...]

> Also, it seems that Python is being inconsistent here. Python knows that
> the string "\x" doesn't contain a full escape sequence, so why doesn't
> it
> treat the string "\x" the same way that it treats the string "\z"?

[...]

> I.e., "\z" is not a legal escape sequence, so it gets left as "\\z".

No. \z *is* a legal escape sequence, it just happens to map to \z.

If you stop thinking of \z as an illegal escape sequence that Python
refuses to raise an error for, the problem goes away. It's a legal escape
sequence that maps to backslash + z.

> "\x" is not a legal escape sequence. Shouldn't it also get left as
> "\\x"?

No, because it actually is an illegal escape sequence.

>> > He's particularly annoyed too, that if he types "foo\xbar" at the
>> > REPL, it echoes back as "foo\\xbar". He finds that to be some sort of
>> > annoying DWIM feature, and if Python is going to have DWIM features,
>> > then it should, for example, figure out what he means by "\" and not
>> > bother him with a syntax error in that case.
>>
>> Now your friend is confused. This is a good thing. Any backslash you
>> see in Python's default string output is *always* an escape:
>
> Well, I think he's more annoyed that if Python is going to be so helpful
> as to put in the missing "\" for you in "foo\zbar", then it should put
> in the missing "\" for you in "\". He considers this to be an
> inconsistency.

(1) There is no missing \ in "foo\zbar".

(2) The problem with "\" isn't a missing backslash, but a missing end-
quote.

> Me, I'd never, ever, EVER want a language to special-case something at
> the end of a string, but I can see that from his new-to-Python
> perspective, Python seems to be DWIMing in one place and not the other,
> and he thinks that it should either do no DWIMing at all, or
> consistently DWIM. To not be consistent in this regard is "inelegant",
> says he.

Python isn't DWIMing here. The rules are simple and straightforward,
there's no mind-reading or guessing required. There is no heuristic
trying to predict what the user intends. It's a simple rule:

When parsing a string literal (apart from raw strings), if you see a
backslash, then grab the next token (usually a single character, but for
\x and \0 it could be multiple characters). If there is a mapping
available for that token, insert that in the string being built, and if
not, insert the backslash and the token.

(As I said earlier, this may not be precisely how it is implemented, but
functionally, it is what Python does.)

> And I can see his point that allowing "foo\zbar" and "foo\\zbar" to be
> synonymous is a form of DWIMing.

Is it "a form of DWIMing" to consider 1.234e1 and 12.34 synonymous?

What about 86 and 0x44? Is that DWIMing?

I'm sure both you and your friend are excellent programmers, but you're
tossing around DWIM as a meaningless term of opprobrium without any
apparent understand of what DWIM actually is.

--
Steven

Duncan Booth

unread,

Aug 10, 2009, 5:41:52 AM8/10/09

to

Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:

> Or perhaps they should follow bash's lead, and map \C to C for every
> character. If there were no special escapes at all, Windows
> programmers wouldn't keep getting burnt when they write
> "C:\\Documents\today\foo" and end up with something completely
> unexpected.
>
> Oh wait, no, that still wouldn't work, because they'd end up with
> C:\Documentstodayfoo. So copying bash doesn't work.
>

There is of course no problem at all so long as you stick to writing your
paths as MS intended them to be written: 8.3 and UPPERCASE

>>> "C:\DOCUME~1\TODAY\FOO"
'C:\\DOCUME~1\\TODAY\\FOO'

:^)

MRAB

unread,

Aug 10, 2009, 7:41:08 AM8/10/09

to pytho...@python.org

Steven D'Aprano wrote:
> On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:
>

[snip]

>> My point of view is that
>> every language has *some* warts; Python just has a bit fewer than most.
>> It would have been nice, I should think, if this wart had been "fixed"
>> in Python 3, as I do consider it to be a minor wart.
>
> And if anyone had cared enough to raise it a couple of years back, it
> possibly might have been.
>

My preference would've been that a backslash followed by A-Z, a-z, or
0-9 is special, but a backslash followed by any other character is just
the character, except for backslash followed by a newline, which
suppresses the newline.

I would also have preferred a backslash in a raw string to always be a
literal.

Ah well, something for Python 4.x. :-)

Douglas Alan

unread,

Aug 10, 2009, 10:11:33 AM8/10/09

to

On Aug 10, 4:37 am, Steven D'Aprano

> There is at least one good reason for preferring an error, namely that it
> allows Python to introduce new escape codes without going through a long,
> slow process. But the rest of these complaints are terribly unconvincing.

What about:

o Beautiful is better than ugly
o Explicit is better than implicit
o Simple is better than complex
o Readability counts
o Special cases aren't special enough to break the rules
o Errors should never pass silently

?

And most importantly:

o In the face of ambiguity, refuse the temptation to guess.
o There should be one -- and preferably only one -- obvious way to
do it.

?

So, what's the one obvious right way to express "foo\zbar"? Is it

"foo\zbar"

or

"foo\\zbar"

And if it's the latter, what possible benefit is there in allowing the
former? And if it's the former, why does Python echo the latter?

|>ouglas

Scott David Daniels

unread,

Aug 10, 2009, 10:58:47 AM8/10/09

to

Douglas Alan wrote:
> So, what's the one obvious right way to express "foo\zbar"? Is it
> "foo\zbar"
> or
> "foo\\zbar"
> And if it's the latter, what possible benefit is there in allowing the
> former? And if it's the former, why does Python echo the latter?

Actually, if we were designing from fresh (with no C behind us), I might
advocate for "\s" to be the escape sequence for a backslash. I don't
particularly like that it is hard to see if the following string
contains a tab: "abc\\\\\\\\\table". The string rules reflect C's
rules, and I see little excuse for trying to change them now.

--Scott David Daniels
Scott....@Acm.Org

Douglas Alan

unread,

Aug 10, 2009, 11:21:03 AM8/10/09

to

On Aug 10, 10:58 am, Scott David Daniels <Scott.Dani...@Acm.Org>
wrote:

> The string rules reflect C's rules, and I see little
> excuse for trying to change them now.

No they don't. Or at least not C++'s rules. C++ behaves exactly as I
should like.

(Or at least g++ does. Or rather *almost* as I would like, as by
default it generates a warning for "foo\zbar", while I think that an
error would be somewhat preferable.)

But you're right, it's too late to change this now.

|>ouglas

Carl Banks

unread,

Aug 10, 2009, 1:29:27 PM8/10/09

to

On Aug 10, 4:41 am, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Steven D'Aprano wrote:
> > On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:
>
> [snip]
> >> My point of view is that
> >> every language has *some* warts; Python just has a bit fewer than most.
> >> It would have been nice, I should think, if this wart had been "fixed"
> >> in Python 3, as I do consider it to be a minor wart.
>
> > And if anyone had cared enough to raise it a couple of years back, it
> > possibly might have been.
>
> My preference would've been that a backslash followed by A-Z, a-z, or
> 0-9 is special, but a backslash followed by any other character is just
> the character, except for backslash followed by a newline, which
> suppresses the newline.

That would be reasonable; it'd match the behavior of regexps.

Carl Banks

unread,

Aug 10, 2009, 1:43:08 PM8/10/09

to

On Aug 10, 1:37 am, Steven D'Aprano

That is a ridiculous rationalization. Nobody sees "\y" in a string
and thinks "it's an escape sequence that returns the bytes '\y'".

[snip rest, because an argument in favor inconsistent, context-
dependent behavior doesn't need any further refutation than to point
out that it is an argument in favor of inconsistent, context-dependent
behavior]

Carl Banks

Douglas Alan

unread,

Aug 10, 2009, 1:52:05 PM8/10/09

to

I wrote:

> But you're right, it's too late to change this now.

P.S. But if it weren't too late, I think that your idea to have "\s"
be the escape sequence for a backslash instead of "\\" might be a good
one.

|>ouglas

Douglas Alan

unread,

Aug 10, 2009, 6:17:24 PM8/10/09

to

From: Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:

> On Mon, 10 Aug 2009 00:32:30 -0700, Douglas Alan wrote:

> > In C++, if I know that the code I'm looking at compiles,
> > then I never need worry that I've misinterpreted what a
> > string literal means.

> If you don't know what your string literals are, you don't
> know what your program does. You can't expect the compiler
> to save you from semantic errors. Adding escape codes into
> the string literal doesn't change this basic truth.

I grow weary of these semantic debates. The bottom line is
that C++'s strategy here catches bugs early on that Python's
approach doesn't. It does so at no additional cost.

From a purely practical point of view, why would any
language not want to adopt a zero-cost approach to catching
bugs, even if they are relatively rare, as early as
possible?

(Other than the reason that adopting it *now* is sadly too
late.)

Furthermore, Python's strategy here is SPECIFICALLY
DESIGNED, according to the reference manual to catch bugs.
I.e., from the original posting on this issue:

Unlike Standard C, all unrecognized escape sequences
are left in the string unchanged, i.e., the backslash
is left in the string. (This behavior is useful when
debugging: if an escape sequence is mistyped, the
resulting output is more easily recognized as broken.)

If this "feature" is designed to catch bugs, why be
half-assed about it? Especially since there seems to be
little valid use case for allowing programmers to be lazy in
their typing here.

> The compiler can't save you from typing 1234 instead of
> 11234, or 31.45 instead of 3.145, or "My darling Ho"
> instead of "My darling Jo", so why do you expect it to
> save you from typing "abc\d" instead of "abc\\d"?

Because in the former cases it can't catch the the bug, and
in the latter case, it can.

> Perhaps it can catch *some* errors of that type, but only
> at the cost of extra effort required to defeat the
> compiler (forcing the programmer to type \\d to prevent
> the compiler complaining about \d). I don't think the
> benefit is worth the cost. You and your friend do. Who is
> to say you're right?

Well, Bjarne Stroustrup, for one.

All of these are value judgments, of course, but I truly
doubt that anyone would have been bothered if Python from
day one had behaved the way that C++ does. Additionally, I
expect that if Python had always behaved the way that C++
does, and then today someone came along and proposed the
behavior that Python currently implements, so that the
programmer could sometimes get away with typing a bit less,
such a person would be chided for not understanding the Zen
of Python.

> > You don't have to go running for the manual every time
> > you see code with backslashes, where the upshot might be
> > that the programmer was merely saving themselves some
> > typing.

> Why do you care if there are "funny characters"?

Because, of course, "funny characters" often have
interesting consequences when output. Furthermore, their
consequences aren't always immediately obvious from looking
at the source code, unless you are intimately familiar with
the function of the special characters in question.

For instance, sometimes in the wrong combination, they wedge
your xterm. Etc.

I'm surprised that this needs to be spelled out.

> In C++, if you see an escape you don't recognize, do you
> care?

Yes, of course I do. If I need to know what the program
does.

> Do you go running for the manual? If the answer is No,
> then why do it in Python?

The answer is that I do in both cases.

> No. \z *is* a legal escape sequence, it just happens to map to \z.

> If you stop thinking of \z as an illegal escape sequence
> that Python refuses to raise an error for, the problem
> goes away. It's a legal escape sequence that maps to
> backslash + z.

(1) I already used that argument on my friend, and he wasn't
buying it. (Personally, I find the argument technically
valid, but commonsensically invalid. It's a language-lawyer
kind of argument, rather than one that appeals to any notion
of real aesthetics.)

(2) That argument disagrees with the Python reference
manual, which explicitly states that "unrecognized escape
sequences are left in the string unchanged", and that the
purpose for doing so is because it "is useful when
debugging".

> > "\x" is not a legal escape sequence. Shouldn't it also
> > get left as "\\x"?
>
> No, because it actually is an illegal escape sequence.

What makes it "illegal". As far as I can tell, it's just
another "unrecognized escape sequence". JavaScript treats it
that way. Are you going to be the one to tell all the
JavaScript programmers that their language can't tell a
legal escape sequence from an illegal one?

> > Well, I think he's more annoyed that if Python is going
> > to be so helpful as to put in the missing "\" for you in
> > "foo\zbar", then it should put in the missing "\" for
> > you in "\". He considers this to be an inconsistency.
>
> (1) There is no missing \ in "foo\zbar".
>
> (2) The problem with "\" isn't a missing backslash, but a
> missing end- quote.

Says who? All of this really depends on your point of
view. The whole morass goes away completely if one adopts
C++'s approach here.

> Python isn't DWIMing here. The rules are simple and straightforward,
> there's no mind-reading or guessing required.

It may not be a complex form of DWIMing, but it's still
DWIMing a bit. Python is figuring that if I typed "\z", then
either I must have really meant to type "\\z", or that I
want to see the backslash when I'm debugging because I made
a mistake, or that I'm just too lazy to type "\\z".

> Is it "a form of DWIMing" to consider 1.234e1 and 12.34
> synonymous?

That's a very different issue, as (1) there are very
significant use cases for both kinds of numerical
representations, and (2) there's often only one obvious way
way that the number should be entered, depending on the
coding situation.

> What about 86 and 0x44? Is that DWIMing?

See previous comment.

> I'm sure both you and your friend are excellent
> programmers, but you're tossing around DWIM as a
> meaningless term of opprobrium without any apparent
> understand of what DWIM actually is.

I don't know if my friend even knows the term DWIM, other
than me paraphrasing him, but I certainly understand all
about the term. It comes from InterLisp. When DWIM was
enabled, your program would run until it hit an error, and
for certain kinds of errors, it would wait a few seconds for
the user to notice the error message, and if the user didn't
tell the program to stop, it would try to figure out what
the user most likely meant, and then continue running using
the computer-generated "fix".

I.e., more or less like continuing on in the face of what
the Python Reference manual refers to as an "unrecognized
escape sequence".

|>ouglas

Steven D'Aprano

unread,

Aug 10, 2009, 11:27:59 PM8/10/09

to

On Mon, 10 Aug 2009 08:21:03 -0700, Douglas Alan wrote:

> But you're right, it's too late to change this now.

Not really. There is a procedure for making non-backwards compatible
changes. If you care deeply enough about this, you could agitate for
Python 3.2 to raise a PendingDepreciation warning for "unexpected" escape
sequences like \z, Python 3.3 to raise a Depreciation warning, and Python
3.4 to treat it as an error.

It may even be possible to skip the PendingDepreciation warning and go
straight for Depreciation warning in 3.2.

--
Steven

Steven D'Aprano

unread,

Aug 11, 2009, 3:07:13 AM8/11/09

to

On Mon, 10 Aug 2009 15:17:24 -0700, Douglas Alan wrote:

> From: Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>
>> On Mon, 10 Aug 2009 00:32:30 -0700, Douglas Alan wrote:
>
>> > In C++, if I know that the code I'm looking at compiles, then I never
>> > need worry that I've misinterpreted what a string literal means.
>
>> If you don't know what your string literals are, you don't know what
>> your program does. You can't expect the compiler to save you from
>> semantic errors. Adding escape codes into the string literal doesn't
>> change this basic truth.
>
> I grow weary of these semantic debates. The bottom line is that C++'s
> strategy here catches bugs early on that Python's approach doesn't. It
> does so at no additional cost.
>
> From a purely practical point of view, why would any language not want
> to adopt a zero-cost approach to catching bugs, even if they are
> relatively rare, as early as possible?

Because the cost isn't zero. Needing to write \\ in a string literal when
you want \ is a cost, and having to read \\ in source code and mentally
translate that to \ is also a cost. By all means argue that it's a cost
that is worth paying, but please stop pretending that it's not a cost.

Having to remember that \n is a "special" escape and \y isn't is also a
cost, but that's a cost you pay in C++ too, if you want your code to
compile.

By the way, you've stated repeatedly that \y will compile with a warning
in g++. So what precisely do you get if you ignore the warning? What do
other C++ compilers do? Apart from the lack of warning, what actually is
the difference between Python's behaviour and C++'s behaviour?

> (Other than the reason that adopting it *now* is sadly too late.)
>
> Furthermore, Python's strategy here is SPECIFICALLY DESIGNED, according
> to the reference manual to catch bugs. I.e., from the original posting
> on this issue:
>
> Unlike Standard C, all unrecognized escape sequences are left in
> the string unchanged, i.e., the backslash is left in the string.
> (This behavior is useful when debugging: if an escape sequence is
> mistyped, the resulting output is more easily recognized as
> broken.)

You need to work on your reading comprehension. It doesn't say anything
about the motivation for this behaviour, let alone that it was
"SPECIFICALLY DESIGNED" to catch bugs. It says it is useful for
debugging. My shoe is useful for squashing poisonous spiders, but it
wasn't designed as a poisonous-spider squashing device.

>> The compiler can't save you from typing 1234 instead of 11234, or 31.45
>> instead of 3.145, or "My darling Ho" instead of "My darling Jo", so why
>> do you expect it to save you from typing "abc\d" instead of "abc\\d"?
>
> Because in the former cases it can't catch the the bug, and in the
> latter case, it can.

I'm not convinced this is a bug that needs catching, but if you think it
is, then that's a reasonable argument.

>> Perhaps it can catch *some* errors of that type, but only at the cost
>> of extra effort required to defeat the compiler (forcing the programmer
>> to type \\d to prevent the compiler complaining about \d). I don't
>> think the benefit is worth the cost. You and your friend do. Who is to
>> say you're right?
>
> Well, Bjarne Stroustrup, for one.

Then let him design his own language *wink*

> All of these are value judgments, of course, but I truly doubt that
> anyone would have been bothered if Python from day one had behaved the
> way that C++ does.

If I'm reading this page correctly, Python does behave as C++ does. Or at
least as Larch/C++ does:

http://www.cs.ucf.edu/~leavens/larchc++manual/lcpp_47.html

>> In C++, if you see an escape you don't recognize, do you care?
>
> Yes, of course I do. If I need to know what the program does.

Precisely the same as in Python.

>> Do you go running for the manual? If the answer is No, then why do it
>> in Python?
>
> The answer is that I do in both cases.

You deleted without answer my next question:

"And if the answer is Yes, then how is Python worse than C++?"

Seems to me that the answer is "It's not worse than C++, it's the same"
-- in both cases, you have to memorize the "special" escape sequences,
and in both cases, if you see an escape you don't recognize, you need to
look it up.

>> No. \z *is* a legal escape sequence, it just happens to map to \z.
>
>> If you stop thinking of \z as an illegal escape sequence that Python
>> refuses to raise an error for, the problem goes away. It's a legal
>> escape sequence that maps to backslash + z.
>
> (1) I already used that argument on my friend, and he wasn't buying it.
> (Personally, I find the argument technically valid, but commonsensically
> invalid. It's a language-lawyer kind of argument, rather than one that
> appeals to any notion of real aesthetics.)

I disagree with your sense of aesthetics. I think that having to write
\\y when I want \y just to satisfy a bondage-and-discipline compiler is
ugly. That's not to deny that B&D isn't useful on occasion, but in this
case I believe the benefit is negligible, and so even a tiny cost is not
worth the pain.

The sweet sweet pain... oh wait, sorry, wrong newsgroup...

> (2) That argument disagrees with the Python reference manual, which
> explicitly states that "unrecognized escape sequences are left in the
> string unchanged", and that the purpose for doing so is because it "is
> useful when debugging".

How does it disagree? \y in the source code mapping to \y in the string
object is the sequence being left unchanged. And the usefulness of doing
so is hardly a disagreement over the fact that it does so.

>> > "\x" is not a legal escape sequence. Shouldn't it also get left as
>> > "\\x"?
>>
>> No, because it actually is an illegal escape sequence.
>
> What makes it "illegal". As far as I can tell, it's just another
> "unrecognized escape sequence".

No, it's recognized, because \x is the prefix for an hexadecimal escape
code. And it's illegal, because it's missing the actual hexadecimal
digits.

> JavaScript treats it that way. Are you
> going to be the one to tell all the JavaScript programmers that their
> language can't tell a legal escape sequence from an illegal one?

Well, it is Javascript...

All joking aside, syntax varies from one language to another. What counts
as a legal escape sequence in Javascript and what counts as a legal
escape sequence in Python are different. What makes you think I'm talking
about Javascript?

>> > Well, I think he's more annoyed that if Python is going to be so
>> > helpful as to put in the missing "\" for you in "foo\zbar", then it
>> > should put in the missing "\" for you in "\". He considers this to be
>> > an inconsistency.
>>
>> (1) There is no missing \ in "foo\zbar".
>>
>> (2) The problem with "\" isn't a missing backslash, but a missing end-
>> quote.
>
> Says who? All of this really depends on your point of view. The whole
> morass goes away completely if one adopts C++'s approach here.

But the morass only exists in the first place because you have adopted
C++'s approach instead of Python's approach -- and (possibly) not even a
standard part of the C++ approach, but a non-standard warning provided by
one compiler out of many.

Even if you disagree about (1), it's easy enough to prove that (2) is
correct:

>>> "\"
File "<stdin>", line 1
"\"
^
SyntaxError: EOL while scanning single-quoted string

This is the exact same error you get here:

>>> "a
File "<stdin>", line 1
"a
^
SyntaxError: EOL while scanning single-quoted string

>> Python isn't DWIMing here. The rules are simple and straightforward,
>> there's no mind-reading or guessing required.
>
> It may not be a complex form of DWIMing, but it's still DWIMing a bit.
> Python is figuring that if I typed "\z", then either I must have really
> meant to type "\\z",

Nope, not in the least. Python NEVER EVER EVER tries to guess what you
mean.

If you type "xyz", it assumes you want "xyz".

If you type "xyz\n", it assumes you want "xyz\n".

If you type "xyz\\n", it assumes you want "xyz\\n".

If you type "xyz\y", it assumes you want "xyz\y".

If you type "xyz\\y", it assumes you want "xyz\\y".

This is *exactly* like C++, except that in Python the semantics of \y and
\\y are identical. Python doesn't guess what you mean, it *imposes* a
meaning on the escape sequence. You just don't like that meaning.

> or that I want to see the backslash when I'm
> debugging because I made a mistake, or that I'm just too lazy to type
> "\\z".

Oh jeez, if you're going to define DWIM so broadly, then *everything* is
DWIM. "If I type '1+2', then the C++ compiler figures out that I must
have wanted to add 1 and 2..."

> I don't know if my friend even knows the term DWIM, other than me
> paraphrasing him, but I certainly understand all about the term. It
> comes from InterLisp. When DWIM was enabled, your program would run
> until it hit an error, and for certain kinds of errors, it would wait a
> few seconds for the user to notice the error message, and if the user
> didn't tell the program to stop, it would try to figure out what the
> user most likely meant, and then continue running using the
> computer-generated "fix".

Right. And Python isn't doing anything even remotely similar to that.

> I.e., more or less like continuing on in the face of what the Python
> Reference manual refers to as an "unrecognized escape sequence".

The wording could be better, I accept. It would be better to talk about
"special escapes" (e.g. \n) and "any non-special escape" (e.g. \y).

--
Steven

Piet van Oostrum

unread,

Aug 11, 2009, 9:50:01 AM8/11/09

to

>>>>> Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> (SD) wrote:

>SD> If I'm reading this page correctly, Python does behave as C++ does. Or at
>SD> least as Larch/C++ does:

>SD> http://www.cs.ucf.edu/~leavens/larchc++manual/lcpp_47.html

They call them `non-standard escape sequences' for a reason: that they
are not in standard C++.

test.cpp:
char* temp = "abc\yz";

TEMP> g++ -c test.cpp
test.cpp:1:1: warning: unknown escape sequence '\y'

--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org

Ethan Furman

unread,

Aug 11, 2009, 10:35:54 AM8/11/09

to pytho...@python.org

And once it's fully depreciated you have to stop writing it off on your
taxes. *wink*

~Ethan~

Steven D'Aprano

unread,

Aug 11, 2009, 2:00:08 PM8/11/09

to

On Tue, 11 Aug 2009 15:50:01 +0200, Piet van Oostrum wrote:

>>>>>> Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> (SD) wrote:
>
>>SD> If I'm reading this page correctly, Python does behave as C++ does.
>>Or at SD> least as Larch/C++ does:
>
>>SD> http://www.cs.ucf.edu/~leavens/larchc++manual/lcpp_47.html
>
> They call them `non-standard escape sequences' for a reason: that they
> are not in standard C++.
>
> test.cpp:
> char* temp = "abc\yz";
>
> TEMP> g++ -c test.cpp
> test.cpp:1:1: warning: unknown escape sequence '\y'

Isn't that a warning, not a fatal error? So what does temp contain?

--
Steven

Douglas Alan

unread,

Aug 11, 2009, 4:20:52 PM8/11/09

to

On Aug 11, 2:00 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> > test.cpp:1:1: warning: unknown escape sequence '\y'
>
> Isn't that a warning, not a fatal error? So what does temp contain?

My "Annotated C++ Reference Manual" is packed, and surprisingly in
Stroustrup's Third Edition, there is no mention of the issue in the
entire 1,000 pages. But Microsoft to the rescue:

If you want a backslash character to appear within a string,
you must type two backslashes (\\)

(From http://msdn.microsoft.com/en-us/library/69ze775t.aspx)

The question of what any specific C++ does if you ignore the warning
is irrelevant, as such behavior in C++ is almost *always* undefined.
Hence the warning.

|>ouglas

Ethan Furman

unread,

Aug 11, 2009, 4:38:37 PM8/11/09

to pytho...@python.org

Almost always undefined? Whereas with Python, and some memorization or
a small table/list nearby, you can easily *know* what you will get.

Mind you, I'm not really vested in how Python *should* handle
backslashes one way or the other, but I am glad it has rules that it
follows for consitent results, and I don't have to break out a byte-code
editor to find out what's in my string literal.

~Ethan~

Douglas Alan

unread,

Aug 11, 2009, 5:29:43 PM8/11/09

to

Steven D'Aprano wrote:

> Because the cost isn't zero. Needing to write \\ in a string
> literal when you want \ is a cost,

I need to preface this entire post with the fact that I've
already used ALL of the arguments that you've provided on my
friend before I ever even came here with the topic, and my
own arguments on why Python can be considered to be doing
the right thing on this issue didn't even convince ME, much
less him. When I can't even convince myself with an argument
I'm making, then you know there's a problem with it!

Now back the our regularly scheduled debate:

I think that the total cost of all of that extra typing for
all the Python programmers in the entire world is now
significantly less than the time it took to have this
debate. Which would have never happened if Python did things
the right way on this issue to begin with. Meaning that
we're now at LESS than zero cost for doing things right!

And we haven't even yet included all the useless heat that
is going to be generated during code reviews and in-house coding
standard debates.

That's why I stand by Python's motto:

THERE SHOULD BE ONE-- AND PREFERABLY ONLY ONE --OBVIOUS
WAY TO DO IT.

> and having to read \\ in source code and mentally
> translate that to \ is also a cost.

For me that has no mental cost. What does have a mental cost
is remembering whether "\b" is an "unrecognized escape
sequence" or not.

> By all means argue that it's a cost that is worth paying,
> but please stop pretending that it's not a cost.

I'm not "pretending". I'm pwning you with logic and common
sense!

> Having to remember that \n is a "special" escape and \y
> isn't is also a cost, but that's a cost you pay in C++ too,
> if you want your code to compile.

Ummm, no I don't! I just always use "\\" when I want a
backslash to appear, and I only think about the more obscure
escape sequences if I actually need them, or some code that
I am reading has used them.

> By the way, you've stated repeatedly that \y will compile
> with a warning in g++. So what precisely do you get if you
> ignore the warning?

A program with undefined behavior. That's typically what a
warning means from a C++ compiler. (Sometimes it means
use of a deprecated feature, though.)

> What do other C++ compilers do?

The Microsoft compilers also consider it to be incorrect
code, as I documented in a different post.

> Apart from the lack of warning, what actually is the

> difference between Python's behavior and C++'s behavior?

That question makes just about as much sense as, "Apart
from the lack of a fatal error, what actually is the
difference between Python's behavior and C++'s?"

Sure, warnings aren't fatal errors, but if you ignore them,
then you are almost always doing something very
wrong. (Unless you're building legacy code.)

> > Furthermore, Python's strategy here is SPECIFICALLY
> > DESIGNED, according to the reference manual to catch
> > bugs. I.e., from the original posting on this issue:
>
> > Unlike Standard C, all unrecognized escape sequences
> > are left in the string unchanged, i.e., the backslash
> > is left in the string. (This behavior is useful when
> > debugging: if an escape sequence is mistyped, the
> > resulting output is more easily recognized as
> > broken.)
>
> You need to work on your reading comprehension. It doesn't
> say anything about the motivation for this behaviour, let
> alone that it was "SPECIFICALLY DESIGNED" to catch bugs. It
> says it is useful for debugging. My shoe is useful for
> squashing poisonous spiders, but it wasn't designed as a
> poisonous-spider squashing device.

As I have a BS from MIT in BS-ology, I can readily set aside
your aspersions to my intellect, and point out the gross
errors of your ways: Natural language does not work the way
you claim. It is is much more practical, implicit, and
elliptical.

More specifically, if your shoe came with a reference manual
claiming that it was useful for squashing poisonous spiders,
then you may now validly assume poisonous spider squashing
was a design requirement of the shoe. (Or at least it has
become one, even if ipso facto.) Furthermore, if it turns out
that the shoe is deficient at poisonous spider squashing,
and consequently causes you to get bitten by a poisonous
spider, then you now have grounds for a lawsuit.

> > Because in the former cases it can't catch the the bug,
> > and in the latter case, it can.
>
> I'm not convinced this is a bug that needs catching, but if
> you think it is, then that's a reasonable argument.

All my arguments are reasonable.

> >> Perhaps it can catch *some* errors of that type, but
> >> only at the cost of extra effort required to defeat the
> >> compiler (forcing the programmer to type \\d to prevent
> >> the compiler complaining about \d). I don't think the
> >> benefit is worth the cost. You and your friend do. Who
> >> is to say you're right?
>
> > Well, Bjarne Stroustrup, for one.
>
> Then let him design his own language *wink*

Oh, I'm not sure that's such a good idea. He might come up
with a language as crazy as C++.

> >> In C++, if you see an escape you don't recognize, do you
> >> care?
>
> > Yes, of course I do. If I need to know what the program
> > does.
>
> Precisely the same as in Python.

Not so at all!

In C++ I only have to run for the manual only when someone
actually puts a *real* escape sequence in their code. With
Python, I have to run for the manual (or at least the REPL),
every time some lame-brained person who thinks they should be
allowed near a keyboard programs using "unrecognized escape
sequences" because they can't be bothered to hit the "\" key
twice.

> Seems to me that the answer is "It's not worse than C++,
> it's the same" -- in both cases, you have to memorize the
> "special" escape sequences, and in both cases, if you see
> an escape you don't recognize, you need to look it up.

The answer is that in this particular case, C++ causes me
far fewer woes! And if C++ is causing me fewer woes than
Language X, then you've got to know that Language X has a
problem.

> I disagree with your sense of aesthetics. I think that
> having to write \\y when I want \y just to satisfy a
> bondage-and-discipline compiler is ugly. That's not to deny
> that B&D isn't useful on occasion, but in this case I
> believe the benefit is negligible, and so even a tiny cost
> is not worth the pain.

EXPLICIT IS BETTER THAN IMPLICIT.

> > (2) That argument disagrees with the Python reference
> > manual, which explicitly states that "unrecognized escape
> > sequences are left in the string unchanged", and that the
> > purpose for doing so is because it "is useful when
> > debugging".
>
> How does it disagree? \y in the source code mapping to \y in
> the string object is the sequence being left unchanged. And
> the usefulness of doing so is hardly a disagreement over the
> fact that it does so.

Because you've stated that "\y" is a legal escape sequence,
while the Python Reference Manual explicitly states that it
is an "unrecognized escape sequence", and that such
"unrecognized escape sequences" are sources of bugs.

> > What makes it "illegal". As far as I can tell, it's just
> > another "unrecognized escape sequence".
>
> No, it's recognized, because \x is the prefix for an
> hexadecimal escape code. And it's illegal, because it's
> missing the actual hexadecimal digits.

So? Why does that make it "illegal" rather than merely
"unrecognized?"

SIMPLE IS BETTER THAN COMPLEX.

> All joking aside, syntax varies from one language to
> another. What counts as a legal escape sequence in
> Javascript and what counts as a legal escape sequence in
> Python are different. What makes you think I'm talking
> about Javascript?

Because anyone with common sense will agree that "\y" is an
illegal escape sequence. The only disagreement should then
be how illegal escape sequences should be handled. Python is
not currently handling them in a way that makes the most
sense.

ERRORS SHOULD NEVER PASS SILENTLY.

> But the morass only exists in the first place because you
> have adopted C++'s approach instead of Python's approach --
> and (possibly) not even a standard part of the C++ approach,
> but a non-standard warning provided by one compiler out of
> many.

Them's fighting words! I rarely adopt the C++ approach to
anything! In this case, (1) C++ just coincidentally happens
to be right, and (2) as far as I can tell, g++ implements
the C++ standard correctly here.

> > It may not be a complex form of DWIMing, but it's still
> > DWIMing a bit. Python is figuring that if I typed "\z",
> > then either I must have really meant to type "\\z",
>
> Nope, not in the least. Python NEVER EVER EVER tries to
> guess what you mean.

Neither does Perl. That doesn't mean that Perl isn't often
DWIMy.

> This is *exactly* like C++, except that in Python the
> semantics of \y and \\y are identical. Python doesn't
> guess what you mean, it *imposes* a meaning on the escape
> sequence. You just don't like that meaning.

That's because I don't like things that are ill-conceived.

> > I.e., more or less like continuing on in the face of
> > what the Python Reference manual refers to as an
> > "unrecognized escape sequence".

> The wording could be better, I accept. It would be better
> to talk about "special escapes" (e.g. \n) and "any
> non-special escape" (e.g. \y).

Or maybe the wording is just fine, and it's the treatment of
unrecognized escape sequences that could be better.

|>ouglas

Douglas Alan

unread,

Aug 11, 2009, 5:39:30 PM8/11/09

to

On Aug 10, 11:27 pm, Steven D'Aprano

<ste...@REMOVE.THIS.cybersource.com.au> wrote:
> On Mon, 10 Aug 2009 08:21:03 -0700, Douglas Alan wrote:
> > But you're right, it's too late to change this now.
>
> Not really. There is a procedure for making non-backwards compatible
> changes. If you care deeply enough about this, you could agitate for
> Python 3.2 to raise a PendingDepreciation warning for "unexpected" escape
> sequences like \z,

How does one do this?

Not that I necessarily think that it is important enough a nit to
break a lot of existing code.

Also, if I "agitate for change", then in the future people might
actually accurately accuse me of agitating for change, when typically
I just come here for a good argument, and I provide a connected series
of statements intended to establish a proposition, but in return I
receive merely the automatic gainsaying of any statement I make.

|>ouglas

Douglas Alan

unread,

Aug 11, 2009, 5:48:24 PM8/11/09

to

On Aug 11, 4:38 pm, Ethan Furman <et...@stoneleaf.us> wrote:

> Mind you, I'm not really vested in how Python *should* handle
> backslashes one way or the other, but I am glad it has rules that it
> follows for consitent results, and I don't have to break out a byte-code
> editor to find out what's in my string literal.

I don't understand your comment. C++ generates a warning if you use an
undefined escape sequence, which indicates that your program should be
fixed. If the escape sequence isn't undefined, then C++ does the same
thing as Python.

It would be *even* better if C++ generated a fatal error in this
situation. (g++ probably has an option to make warnings fatal, but I
don't happen to know what that option is.) g++ might not generate an
error so that you can compile legacy C code with it.

In any case, my argument has consistently been that Python should have
treated undefined escape sequences consistently as fatal errors, not
as warnings.

|>ouglas

Steven D'Aprano

unread,

Aug 12, 2009, 3:08:06 AM8/12/09

to

On Tue, 11 Aug 2009 14:48:24 -0700, Douglas Alan wrote:

> In any case, my argument has consistently been that Python should have
> treated undefined escape sequences consistently as fatal errors,

A reasonable position to take. I disagree with it, but it is certainly
reasonable.

> not as warnings.

I don't know what language you're talking about here, because non-special
escape sequences in Python aren't either errors or warnings:

>>> print "ab\cd"
ab\cd

No warning is made, because it's not considered an error that requires a
warning. This matches the behaviour of other languages, including C and
bash.

--
Steven

Steven D'Aprano

unread,

Aug 12, 2009, 3:36:56 AM8/12/09

to

On Tue, 11 Aug 2009 13:20:52 -0700, Douglas Alan wrote:

> On Aug 11, 2:00 pm, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>
>> > test.cpp:1:1: warning: unknown escape sequence '\y'
>>
>> Isn't that a warning, not a fatal error? So what does temp contain?
>
> My "Annotated C++ Reference Manual" is packed, and surprisingly in
> Stroustrup's Third Edition, there is no mention of the issue in the
> entire 1,000 pages. But Microsoft to the rescue:
>
> If you want a backslash character to appear within a string, you
> must type two backslashes (\\)
>
> (From http://msdn.microsoft.com/en-us/library/69ze775t.aspx)

Should I assume that Microsoft's C++ compiler treats it as an error, not
a warning? Or is is this *still* undefined behaviour, and MS C++ compiler
will happily compile "ab\cd" whatever it feels like?

> The question of what any specific C++ does if you ignore the warning is
> irrelevant, as such behavior in C++ is almost *always* undefined. Hence
> the warning.

So a C++ compiler which follows Python's behaviour would be behaving
within the language specifications.

I note that the bash shell, which claims to follow C semantics, also does
what Python does:

$ echo $'a s\trin\g with escapes'
a s rin\g with escapes

Explain to me again why we're treating underspecified C++ semantics,
which may or may not do *exactly* what Python does, as if it were the One
True Way of treating escape sequences?

--
Steven

Steven D'Aprano

unread,

Aug 12, 2009, 5:32:01 AM8/12/09

to

On Tue, 11 Aug 2009 14:29:43 -0700, Douglas Alan wrote:

> I need to preface this entire post with the fact that I've already used
> ALL of the arguments that you've provided on my friend before I ever
> even came here with the topic, and my own arguments on why Python can be
> considered to be doing the right thing on this issue didn't even
> convince ME, much less him. When I can't even convince myself with an
> argument I'm making, then you know there's a problem with it!

I hear all your arguments, and to play Devil's Advocate I repeat them,
and they don't convince me either. So by your logic, there's obviously a
problem with your arguments as well!

That problem basically boils down to a deep-seated philosophical
disagreement over which philosophy a language should follow in regard to
backslash escapes:

"Anything not explicitly permitted is forbidden"

versus

"Anything not explicitly forbidden is permitted"

Python explicitly permits all escape sequences, with well-defined
behaviour, with the only ones forbidden being those explicitly forbidden:

* hex escapes with invalid hex digits;

* oct escapes with invalid oct digits;

* Unicode named escapes with unknown names;

* 16- and 32-bit Unicode escapes with invalid hex digits.

C++ apparently forbids all escape sequences, with unspecified behaviour
if you use a forbidden sequence, except for a handful of explicitly
permitted sequences.

That's not better, it's merely different.

Actually, that's not true -- that the C++ standard forbids a thing, but
leaves the consequences of doing that thing unspecified, is clearly a Bad
Thing.

[...]

>> Apart from the lack of warning, what actually is the difference between
>> Python's behavior and C++'s behavior?
>
> That question makes just about as much sense as, "Apart from the lack of
> a fatal error, what actually is the difference between Python's behavior
> and C++'s?"

This is what I get:

[steve ~]$ cat test.cc
#include <iostream>
int main(int argc, char* argv[])
{
std::cout << "x\yz" << std::endl;
return 0;
}
[steve ~]$ g++ test.cc -o test
test.cc:4:14: warning: unknown escape sequence '\y'
[steve@soy ~]$ ./test
xyz

So on at least one machine in the world, C++ simply strips out
backslashes that it doesn't recognise, leaving the suffix. Unfortunately,
we can't rely on that, because C++ is underspecified. Fortunately this is
not a problem with Python, which does completely specify the behaviour of
escape sequences so there are no surprises.

[...]

>> I disagree with your sense of aesthetics. I think that having to write
>> \\y when I want \y just to satisfy a bondage-and-discipline compiler is
>> ugly. That's not to deny that B&D isn't useful on occasion, but in this
>> case I believe the benefit is negligible, and so even a tiny cost is
>> not worth the pain.
>
> EXPLICIT IS BETTER THAN IMPLICIT.

Quoting the Zen without understanding (especially shouting) doesn't
impress anyone. There's nothing implicit about escape sequences. \y is
perfectly explicit. Look Ma, there's a backslash, and a y, it gives a
backslash and a y!

Implicit has an actual meaning. You shouldn't use it as a mere term of
opprobrium for anything you don't like.

>> > (2) That argument disagrees with the Python reference manual, which
>> > explicitly states that "unrecognized escape sequences are left in the
>> > string unchanged", and that the purpose for doing so is because it
>> > "is useful when debugging".
>>
>> How does it disagree? \y in the source code mapping to \y in the string
>> object is the sequence being left unchanged. And the usefulness of
>> doing so is hardly a disagreement over the fact that it does so.
>
> Because you've stated that "\y" is a legal escape sequence, while the
> Python Reference Manual explicitly states that it is an "unrecognized
> escape sequence", and that such "unrecognized escape sequences" are
> sources of bugs.

There's that reading comprehension problem again.

Unrecognised != illegal.

"Useful for debugging" != "source of bugs". If they were equal, we could
fix an awful lot of bugs by throwing away our debugging tools.

Here's the URL to the relevant page:
http://www.python.org/doc/2.5.2/ref/strings.html

It seems to me that the behaviour the Python designers were looking to
avoid was the case where the coder accidentally inserted a backslash in
the wrong place, and the language stripped the backslash out, e.g.:

Wanted "a\bcd" but accidentally typed "ab\cd" instead, and got "abcd".

(This is what Bash does by design, and at least some C/C++ compilers do,
perhaps by accident, perhaps by design.)

In that case, with no obvious backslash, the user may not even be aware
that there was a problem:

s = "ab\cd" # assume the backslash is silently discarded
assert len(s) == 4
assert s[3] == 'c'
assert '\\' not in s

All of these tests would wrongly pass, but with Python's behaviour of
leaving the backslash in, they would all fail, and the string is visually
distinctive (it has an obvious backslash in it).

Now, if you consider that \c should be an error, then obviously it would
be even better if "ab\cd" would raise a SyntaxError. But why consider \c
to be an error?

[invalid hex escape sequences]

>> > What makes it "illegal". As far as I can tell, it's just another
>> > "unrecognized escape sequence".
>>
>> No, it's recognized, because \x is the prefix for an hexadecimal escape
>> code. And it's illegal, because it's missing the actual hexadecimal
>> digits.
>
> So? Why does that make it "illegal" rather than merely "unrecognized?"

Because the empty string is not a legal pair of hex digits.

In '\y', the suffix y is a legal character, but it isn't recognized as a
"special" character.

In '\x', the suffix '' is not a pair of hex digits. Since hex-escapes are
documented as requiring a pair of hex digits, this is an error.

[...]

> Because anyone with common sense will agree that "\y" is an illegal
> escape sequence.

"No True Scotsman would design a language that behaves like that!!!!"

Why should it be illegal? It seems like a perfectly valid escape sequence
to me, so long as the semantics are specified explicitly.

[...]

>> > It may not be a complex form of DWIMing, but it's still DWIMing a
>> > bit. Python is figuring that if I typed "\z", then either I must
>> > have really meant to type "\\z",
>>
>> Nope, not in the least. Python NEVER EVER EVER tries to guess what you
>> mean.
>
> Neither does Perl. That doesn't mean that Perl isn't often DWIMy.

Fine, but we're not discussing Perl, we're discussing Python. Perl's DWIM-
iness is irrelevant.

>> This is *exactly* like C++, except that in Python the semantics of \y
>> and \\y are identical. Python doesn't guess what you mean, it *imposes*
>> a meaning on the escape sequence. You just don't like that meaning.
>
> That's because I don't like things that are ill-conceived.

And yet you like C++... go figure *wink*

--
Steven

Douglas Alan

unread,

Aug 12, 2009, 1:23:50 PM8/12/09

to

On Aug 12, 3:08 am, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:

> On Tue, 11 Aug 2009 14:48:24 -0700, Douglas Alan wrote:
> > In any case, my argument has consistently been that Python should have
> > treated undefined escape sequences consistently as fatal errors,
>
> A reasonable position to take. I disagree with it, but it is certainly
> reasonable.
>
> > not as warnings.
>
> I don't know what language you're talking about here, because non-special
> escape sequences in Python aren't either errors or warnings:
>
> >>> print "ab\cd"
>
> ab\cd

I was talking about C++, whose compilers tend to generate warnings for
this usage. I think that the C++ compilers I've used take the right
approach, only ideally they should be *even* more emphatic, and
elevate the problem from a warning to an error.

I assume, however, that the warning is a middle ground between doing
the completely right thing, and, I assume, maintaining backward
compatibility with common C implementations. As Python never had to
worry about backward compatibility with C, Python didn't have to walk
such a middle ground.

On the other hand, *now* it has to worry about backward compatibility
with itself.

|>ouglas

Douglas Alan

unread,

Aug 12, 2009, 1:47:55 PM8/12/09

to

On Aug 12, 3:36 am, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:

> On Tue, 11 Aug 2009 13:20:52 -0700, Douglas Alan wrote:

> > My "Annotated C++ Reference Manual" is packed, and surprisingly in
> > Stroustrup's Third Edition, there is no mention of the issue in the
> > entire 1,000 pages. But Microsoft to the rescue:
>
> > If you want a backslash character to appear within a string, you
> > must type two backslashes (\\)
>
> > (From http://msdn.microsoft.com/en-us/library/69ze775t.aspx)
>
> Should I assume that Microsoft's C++ compiler treats it as an error, not
> a warning?

In my experience, C++ compilers generally generate warnings for such
situations, where they can. (Clearly, they often can't generate
warnings for running off the end of an array, which is also undefined,
though a really smart C++ compiler might be able to generate a warning
in certain such circumstances.)

> Or is is this *still* undefined behaviour, and MS C++ compiler
> will happily compile "ab\cd" whatever it feels like?

If it's a decent compiler, it will generate a warning. Who can say
with Microsoft, however. It's clearly documented as illegal code,
however.

> > The question of what any specific C++ does if you ignore the warning is
> > irrelevant, as such behavior in C++ is almost *always* undefined. Hence
> > the warning.
>
> So a C++ compiler which follows Python's behaviour would be behaving
> within the language specifications.

It might be, but there are also *recommendations* in the C++ standard
about what to do in such situations, and the recommendations say, I am
pretty sure, not to do that, unless the particular compiler in
question has to meet some very specific backward compatibility needs.

> I note that the bash shell, which claims to follow C semantics, also does
> what Python does:
>
> $ echo $'a s\trin\g with escapes'
> a s rin\g with escapes

Really? Not on my computers. (One is a Mac, and the other is a Fedora
Core Linux box.) On my computers, bash doesn't seem to have *any*
escape sequences, other than \\, \", \$, and \`. It seems to treat
unknown escape sequences the same as Python does, but as there are
only four known escape sequences, and they are all meant merely to
guard against string interpolation, and the like, it's pretty darn
easy to keep straight.

> Explain to me again why we're treating underspecified C++ semantics,
> which may or may not do *exactly* what Python does, as if it were the One
> True Way of treating escape sequences?

I'm not saying that C++ does it right for Python. The right thing for
Python to do is to generate an error, as Python doesn't have to deal
with all the crazy complexities that C++ has to.

|>ouglas

Douglas Alan

unread,

Aug 12, 2009, 5:21:34 PM8/12/09

to Dark Water

On Aug 12, 5:32 am, Steven D'Aprano
<ste...@REMOVE.THIS.cybersource.com.au> wrote:

> That problem basically boils down to a deep-seated
> philosophical disagreement over which philosophy a
> language should follow in regard to backslash escapes:
>
> "Anything not explicitly permitted is forbidden"
>
> versus
>
> "Anything not explicitly forbidden is permitted"

No, it doesn't. It boils down to whether a language should:

(1) Try it's best to detect errors as early as possible,
especially when the cost of doing so is low.

(2) Make code as readable as possible, in part by making
code as self-evident as possible by mere inspection and by
reducing the amount of stuff that you have to memorize. Perl
fails miserably in this regard, for instance.

(3) To quote Einstein, make everything as simple as
possible, and no simpler.

(4) Take innately ambiguous things and not force them to be
unambiguous by mere fiat.

Allowing a programmer to program using a completely
arbitrary resolution of "unrecognized escape sequences"
violates all of the above principles.

The fact that the meanings of unrecognized escape sequences
are ambiguous is proved by the fact that every language
seems to treat them somewhat differently, demonstrating that
there is no natural intuitive meaning for them.

Furthermore, allowing programmers to use "unrecognized escape
sequences" without raising an error violates:

(1) Explicit is better than implicit:

Python provides a way to explicitly specify that you want a
backslash. Every programmer should be encouraged to use
Python's explicit mechanism here.

(2) Simple is better than complex:

Python currently has two classes of ambiguously
interpretable escape sequences: "unrecognized ones", and
"illegal" ones. Making a single class (i.e. just illegal
ones) is simpler.

Also, not having to memorize escape sequences that you
rarely have need to use is simpler.

(3) Readability counts:

See above comments on readability.

(4) Errors should never pass silently:

Even the Python Reference Manual indicates that unrecognized
escape sequences are a source of bugs. (See more comments on
this below.)

(5) In the face of ambiguity, refuse the temptation to
guess.

Every language, other than C++, is taking a guess at what
the programmer would find to be most useful expansion for
unrecognized escape sequences, and each of the languages is
guessing differently. This temptation should be refused!

You can argue that once it is in the Reference Manual it is
no longer a guess, but that is patently specious, as Perl
proves. For instance, the fact that Perl will quietly convert
an array into a scalar for you, if you assign the array to a
scalar variable is certainly a "guess" of the sort that this
Python koan is referring to. Likewise for an arbitrary
interpretation of unrecognized escape sequences.

(6) There should be one-- and preferably only one --obvious
way to do it.

What is the one obvious way to express "\\y"? It is "\\y" or
"\y"?

Python can easily make one of these ways the "one obvious
way" by making the other one raise an error.

(7) Namespaces are one honking great idea -- let's do more
of those!

Allowing "\y" to self-expand is intruding into the namespace
for special characters that require an escape sequence.

> C++ apparently forbids all escape sequences, with
> unspecified behaviour if you use a forbidden sequence,
> except for a handful of explicitly permitted sequences.
>
> That's not better, it's merely different.

It *is* better, as it catches errors early on at little
cost, and for all the other reasons listed above.

> Actually, that's not true -- that the C++ standard forbids
> a thing, but leaves the consequences of doing that thing
> unspecified, is clearly a Bad Thing.

Indeed. But C++ has backward compatibly issues that make
any that Python has to deal with, pale in comparison. The
recommended behavior for a C++ compiler, however, is to flag
the problem as an error or as a warning.

> So on at least one machine in the world, C++ simply strips

> out backslashes that it doesn't recognize, leaving the

> suffix. Unfortunately, we can't rely on that, because C++
> is underspecified.

No, *fortunately* you can't rely on it, forcing you to go
fix your code.

> Fortunately this is not a problem with
> Python, which does completely specify the behaviour of
> escape sequences so there are no surprises.

It's not a surprise when the C++ compiler issues a warning to
you. If you ignore the warning, then you have no one to
blame but yourself.

> Implicit has an actual meaning. You shouldn't use it as a
> mere term of opprobrium for anything you don't like.

Pardon me, but I'm using "implicit" to mean "implicit", and
nothing more.

Python's behavior here is "implicit" in the very same way
that Perl implicitly converts an array into a scalar for
you. (Though that particular Perl behavior is a far bigger
wart than Python's behavior is here!)

> > Because you've stated that "\y" is a legal escape
> > sequence, while the Python Reference Manual explicitly
> > states that it is an "unrecognized escape sequence", and
> > that such "unrecognized escape sequences" are sources of
> > bugs.
>
> There's that reading comprehension problem again.
>
> Unrecognised != illegal.

This is reasoning that only a lawyer could love.

The right thing for a programming language to do, when
handed something that is syntactically "unrecognized" is to
raise an error.

> It seems to me that the behaviour the Python designers
> were looking to avoid was the case where the coder
> accidentally inserted a backslash in the wrong place, and
> the language stripped the backslash out, e.g.:
>
> Wanted "a\bcd" but accidentally typed "ab\cd" instead, and
> got "abcd".

The moral of the story is that *any* arbitrary
interpretation of unrecognized escape sequences is a
potential source of bugs. In Python, you just end up with a
converse issue, where one might understandably assume that
"foo\bar" has a backslash in it, because "foo\yar" and
*most* other similar strings do. But then it doesn't.

> >> This is *exactly* like C++, except that in Python the
> >> semantics of \y and \\y are identical. Python doesn't
> >> guess what you mean, it *imposes* a meaning on the
> >> escape sequence. You just don't like that meaning.

> > That's because I don't like things that are
> > ill-conceived.

> And yet you like C++... go figure *wink*

Now that's a bold assertion!

I think that "tolerate C++" is more like it. But C++ does
have its moments.

|>ouglas

Steven D'Aprano

unread,

Aug 12, 2009, 7:19:20 PM8/12/09

to

On Wed, 12 Aug 2009 14:21:34 -0700, Douglas Alan wrote:

> On Aug 12, 5:32 am, Steven D'Aprano
> <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>
>> That problem basically boils down to a deep-seated philosophical
>> disagreement over which philosophy a language should follow in regard
>> to backslash escapes:
>>
>> "Anything not explicitly permitted is forbidden"
>>
>> versus
>>
>> "Anything not explicitly forbidden is permitted"
>
> No, it doesn't. It boils down to whether a language should:
>
> (1) Try it's best to detect errors as early as possible, especially when
> the cost of doing so is low.

You are making an unjustified assumption: \y is not an error. It is only
an error if you think that anything not explicitly permitted is forbidden.

While I'm amused that you've made my own point for me, I'm less amused
that you seem to be totally incapable of seeing past your parochial
language assumptions, even when those assumptions are explicitly pointed
out to you. Am I wasting my time engaging you in discussion?

There's a lot more I could say, but time is short, so let me just
summarise:

I disagree with nearly everything you say in this post. I think that a
few points you make have some validity, but the vast majority are based
on a superficial and confused understanding of language design
principles. (I won't justify that claim now, perhaps later, time
permitting.) Nevertheless, I think that your ultimate wish -- for \y etc
to be considered an error -- is a reasonable design choice, given your
assumptions. But it's not the only reasonable design choice, and Bash has
made a different choice, and Python has made yet a third reasonable
choice, and Pascal made yet a fourth reasonable choice.

These are all reasonable choices, all have some good points and some bad
points, but ultimately the differences between them are mostly arbitrary
personal preference, like the colour of a car. Disagreements over
preferences I can live with. One party insisting that red is the only
logical colour for a car, and that anybody who prefers white or black or
blue is illogical, is unacceptable.

--
Steven

MRAB

unread,

Aug 12, 2009, 7:40:03 PM8/12/09

to pytho...@python.org

IHMO, it would've been simpler in the long run to say that backslash
followed by one of [0-9A-Za-z] is an escape sequence, backslash followed
by newline is ignored, and backslash followed by anything else is that
something. That way there would be a way to introduce additional escape
sequences without breaking existing code.

Douglas Alan

unread,

Aug 13, 2009, 3:37:42 AM8/13/09

to

On Aug 12, 7:19 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> You are making an unjustified assumption: \y is not an error.

You are making in an unjustified assumption that I ever made such an
assumption!

My claim is and has always been NOT that \y is inately an error, but
rather that treating unrecognized escape sequences as legal escape
sequences is error PRONE.

> While I'm amused that you've made my own point for me, I'm less
> amused that you seem to be totally incapable of seeing past your
> parochial language assumptions,

Where do you get the notion that my assumptions are in any sense
"parochial"? They come from (1) a great deal of experience programming
very reliable software, and (2) having learned at least two dozen
different programming languages in my life.

> I disagree with nearly everything you say in this post. I think
> that a few points you make have some validity, but the vast
> majority are based on a superficial and confused understanding
> of language design principles.

Whatever. I've taken two graduate level classes at MIT on programming
languages design, and got an A in both classes, and designed my own
programming language as a final project, and received an A+. But I
guess I don't really know anything about the topic at all.

> But it's not the only reasonable design choice, and Bash has
> made a different choice, and Python has made yet a third
> reasonable choice, and Pascal made yet a fourth reasonable choice.

And so did Perl and PHP, and whatever other programming language you
happen to mention. In fact, all programming languages are equally
good, so we might as well just freeze all language design as it is
now. Clearly we can do no better.

> One party insisting that red is the only logical colour for a
> car, and that anybody who prefers white or black or blue is
> illogical, is unacceptable.

If having all cars be red saved a lot of lives, or increased gas
mileage significantly, then it might very well be the best color for a
car. But of course, that is not the case. With programming languages,
there is much more likely to be an actual fact of the matter on which
sorts of language design decisions make programmers more productive on
average, and which ones result in more reliable software.

I will certainly admit that obtaining objective data on such things is
very difficult, but it's a completely different thing that one's color
preference for their car.

|>ouglas

Aahz

unread,

Aug 14, 2009, 10:07:31 AM8/14/09

to

In article <6e13754c-1fa6-4d1b...@h30g2000vbr.googlegroups.com>,
Douglas Alan <darkw...@gmail.com> wrote:
>
>My friend begs to differ with the above. It would be much better for
>debugging if Python generated a parsing error for unrecognized escape
>sequences, rather than leaving them unchanged. g++ outputs a warning
>for such escape sequences, for instance. This is what I would consider
>to be the correct behavior. (Actually, I think it should just generate
>a fatal parsing error, but a warning is okay too.)

Well, then, the usual response applies: create a patch, discuss it on
python-ideas, and see what happens.

(That is, nobody has previously complained so vociferously IIRC, and
adding a warning is certainly within the bounds of what's theoretically
acceptable.)
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"I saw `cout' being shifted "Hello world" times to the left and stopped
right there." --Steve Gonedes

Steven D'Aprano

unread,

Aug 14, 2009, 12:11:52 PM8/14/09

to

On Fri, 14 Aug 2009 07:07:31 -0700, Aahz wrote:

> "I saw `cout' being shifted "Hello world" times to the left and stopped
> right there." --Steve Gonedes

Assuming that's something real, and not invented for humour, I presume
that's describing something possible in C++. Am I correct? What the hell
would it actually do???

--
Steven

Grant Edwards

unread,

Aug 14, 2009, 12:17:36 PM8/14/09

to

On 2009-08-14, Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> wrote:
> On Fri, 14 Aug 2009 07:07:31 -0700, Aahz wrote:
>
>> "I saw `cout' being shifted "Hello world" times to the left and stopped
>> right there." --Steve Gonedes
>
> Assuming that's something real, and not invented for humour, I presume
> that's describing something possible in C++. Am I correct?

Yes. In C++, the "<<" operator is overloaded. Judging by the
context in which I've seen it used, it does something like
write strings to a stream.

> What the hell
> would it actually do???

IIRC in C++,

cout << "Hello world";

is equivalent to this in C:

printf("Hellow world");

or this in Python:

print "hellow world"

--
Grant Edwards grante Yow! Bo Derek ruined
at my life!
visi.com

MRAB

unread,

Aug 14, 2009, 12:40:04 PM8/14/09

to pytho...@python.org

Grant Edwards wrote:
> On 2009-08-14, Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> wrote:
>> On Fri, 14 Aug 2009 07:07:31 -0700, Aahz wrote:
>>
>>> "I saw `cout' being shifted "Hello world" times to the left and stopped
>>> right there." --Steve Gonedes
>> Assuming that's something real, and not invented for humour, I presume
>> that's describing something possible in C++. Am I correct?
>
> Yes. In C++, the "<<" operator is overloaded. Judging by the
> context in which I've seen it used, it does something like
> write strings to a stream.
>
>> What the hell
>> would it actually do???
>
> IIRC in C++,
>
> cout << "Hello world";
>

It also returns cout, so you can chain them:

cout << "Hello, " << name << '\n';

Douglas Alan

unread,

Aug 14, 2009, 12:42:15 PM8/14/09

to

On Aug 14, 12:17 pm, Grant Edwards <invalid@invalid> wrote:

> On 2009-08-14, Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> wrote:

> > On Fri, 14 Aug 2009 07:07:31 -0700, Aahz wrote:

> >> "I saw `cout' being shifted "Hello world" times to the left and stopped
> >> right there." --Steve Gonedes
>
> > Assuming that's something real, and not invented for humour, I presume
> > that's describing something possible in C++. Am I correct?
>
> Yes. In C++, the "<<" operator is overloaded. Judging by the
> context in which I've seen it used, it does something like
> write strings to a stream.

There's a persistent rumor that it is *this* very "abuse" of
overloading that caused Java to avoid operator overloading all
together.

But then then Java went and used "+" as the string concatenation
operator. Go figure!

|>ouglas

P.S. Overloading "left shift" to mean "output" does indeed seem a bit
sketchy, but in 15 years of C++ programming, I've never seen it cause
any confusion or bugs.

Steven D'Aprano

unread,

Aug 14, 2009, 1:55:59 PM8/14/09

to

I think I've spent enough time on this discussion, so I won't be directly
responding to any of your recent points -- it's clear that I'm not
persuading you that there's any justification for any behaviour for
escape sequences other than the way C++ deals with them. That's your
prerogative, of course, but I've done enough tilting at windmills for
this week, so I'll just make one final comment and then withdraw from an
unproductive argument. (I will make an effort to read any final comments
you wish to make, so feel free to reply. Just don't expect an answer to
any questions.)

Douglas, you and I clearly have a difference of opinion on this. Neither
of us have provided even the tiniest amount of objective, replicable,
reliable data on the error-proneness of the C++ approach versus that of
Python. The supposed superiority of the C++ approach is entirely
subjective and based on personal opinion instead of quantitative facts.

I prefer languages that permit anything that isn't explicitly forbidden,
so I'm happy that Python treats non-special escape sequences as valid,
and your attempts to convince me that this goes against the Zen have
entirely failed to convince me. As I've done before, I will admit that
one consequence of this design is that it makes it hard to introduce new
escape sequences to Python. Given that it's vanishingly rare to want to
do so, and that wanting to add backslashes to strings is common, I think
that's a reasonable tradeoff. Other languages may make different
tradeoffs, and that's fine by me.

--
Steven

Erik Max Francis

unread,

Aug 14, 2009, 4:10:57 PM8/14/09

to

Well, plus or minus newlines.

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM/Y!M/Skype erikmaxfrancis
It's hard to say what I want my legacy to be when I'm long gone.
-- Aaliyah

Grant Edwards

unread,

Aug 14, 2009, 4:13:31 PM8/14/09

to

On 2009-08-14, Erik Max Francis <m...@alcyone.com> wrote:
> Grant Edwards wrote:
>> On 2009-08-14, Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> wrote:
>>> What the hell
>>> would it actually do???
>>
>> IIRC in C++,
>>
>> cout << "Hello world";
>>
>> is equivalent to this in C:
>>
>> printf("Hellow world");
>>
>> or this in Python:
>>
>> print "hellow world"
>
> Well, plus or minus newlines.

And a few miscellaneous typos...

--
Grant Edwards grante Yow! I don't understand
at the HUMOUR of the THREE
visi.com STOOGES!!

Erik Max Francis

unread,

Aug 14, 2009, 4:18:04 PM8/14/09

to

Grant Edwards wrote:
> On 2009-08-14, Erik Max Francis <m...@alcyone.com> wrote:
>> Grant Edwards wrote:
>>> On 2009-08-14, Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> wrote:
>>>> What the hell
>>>> would it actually do???
>>> IIRC in C++,
>>>
>>> cout << "Hello world";
>>>
>>> is equivalent to this in C:
>>>
>>> printf("Hellow world");
>>>
>>> or this in Python:
>>>
>>> print "hellow world"
>> Well, plus or minus newlines.
>
> And a few miscellaneous typos...

... and includes and namespaces :-).

Dave Angel

unread,

Aug 14, 2009, 10:25:18 PM8/14/09

to Benjamin Kaplan, pytho...@python.org

Benjamin Kaplan wrote:

> On Fri, Aug 14, 2009 at 12:42 PM, Douglas Alan <darkw...@gmail.com>wrote:
>
>
>> P.S. Overloading "left shift" to mean "output" does indeed seem a bit
>> sketchy, but in 15 years of C++ programming, I've never seen it cause
>> any confusion or bugs.
>>
>
>
>

> The only reason it hasn't is because people use it in "Hello World". I bet
> some newbie C++ programmers get confused the first time they see << used to
> shift.
>
>
Actually, I've seen it cause confusion, because of operator precedence.
The logical shift operators have a fairly high level priority, so
sometimes you need parentheses that aren't obvious. Fortunately, most
of those cases make compile errors.

C++ has about 17 levels of precedence, plus some confusing associative
rules. And operator overloading does *NOT* change precedence.

DaveA

Hendrik van Rooyen

unread,

Aug 15, 2009, 4:47:22 AM8/15/09

to pytho...@python.org

It would shift "cout" left "Hello World" times.
It is unclear if the shift wraps around or not.

It is similar to a banana *holding his hands apart about a foot* this colour.

- Hendrik

Chris Rebert

unread,

Aug 15, 2009, 4:48:30 AM8/15/09

to Hendrik van Rooyen, pytho...@python.org

On Sat, Aug 15, 2009 at 4:47 AM, Hendrik van
Rooyen<hen...@microcorp.co.za> wrote:
> On Friday 14 August 2009 18:11:52 Steven D'Aprano wrote:

> It would shift "cout" left "Hello World" times.
> It is unclear if the shift wraps around or not.
>
> It is similar to a banana *holding his hands apart about a foot* this colour.
>
> - Hendrik

I think you managed to successfully dereference the null pointer there...

Cheers,
Chris

Douglas Alan

unread,

Aug 15, 2009, 4:01:43 PM8/15/09

to

On Aug 14, 10:25 pm, Dave Angel <da...@ieee.org> wrote:

> Benjamin Kaplan wrote:

> > On Fri, Aug 14, 2009 at 12:42 PM, Douglas Alan <darkwate...@gmail.com>wrote:

> >> P.S. Overloading "left shift" to mean "output" does indeed seem a bit
> >> sketchy, but in 15 years of C++ programming, I've never seen it cause
> >> any confusion or bugs.

> > The only reason it hasn't is because people use it in "Hello World". I bet
> > some newbie C++ programmers get confused the first time they see << used to
> > shift.

People typically get confused by a *lot* of things when they learn a
new language. I think the better metric is how people fare with a
language feature once they've grown accustomed to the language, and
how long it takes them to acquire this familiarity.

> Actually, I've seen it cause confusion, because of operator precedence.
> The logical shift operators have a fairly high level priority, so
> sometimes you need parentheses that aren't obvious. Fortunately, most
> of those cases make compile errors.

I've been programming in C++ so long that for me, if there's any
confusion, it's the other way around. I see "<<" or ">>" and I think I/
O. I don't immediately think shifting. Fortunately, shifting is a
pretty rare operation to actually use, which is perhaps why C++
reclaimed it for I/O.

On the other hand, you are right that the precedence of "<<" is messed
up for I/O. I've never seen a real-world case where this causes a bug
in C++ code, because the static type-checker always seems to catch the
error. In a dynamically typed language, this would be a much more
serious problem.

|>ouglas

P.S. I find it strange, however, that anyone who is not okay with
"abusing" operator overloading in this manner, wouldn't also take
umbrage at Python's overloading of "+" to work with strings and lists,
etc. Numerical addition and sequence concatenation have entirely
different semantics.

Douglas Alan

unread,

Aug 15, 2009, 8:57:28 PM8/15/09

to

On Aug 14, 1:55 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> Douglas, you and I clearly have a difference of opinion on
> this. Neither of us have provided even the tiniest amount
> of objective, replicable, reliable data on the
> error-proneness of the C++ approach versus that of
> Python. The supposed superiority of the C++ approach is
> entirely subjective and based on personal opinion instead
> of quantitative facts.

Alas, this is true for nearly any engineering methodology or
philosophy, which is why, I suppose, Perl, for instance,
still has its proponents. It's virtually impossible to prove
any thesis, and these things only get decided by endless
debate that rages across decades.

> I prefer languages that permit anything that isn't
> explicitly forbidden, so I'm happy that Python treats
> non-special escape sequences as valid,

I don't really understand what you mean by this. If Python
were to declare that "unrecognized escape sequences" were
forbidden, then they would be "explicitly forbidden". Would
you then be happy?

If not, why are you not upset that Python won't let me do

[3, 4, 5] + 2

Some other programming languages I've used certainly do.

> and your attempts to convince me that this goes against
> the Zen have entirely failed to convince me. As I've done
> before, I will admit that one consequence of this design
> is that it makes it hard to introduce new escape sequences
> to Python. Given that it's vanishingly rare to want to do
> so,

I'm not so convinced of that in the days of Unicode. If I
see, backslash, and then some Kanji character, what am I
supposed to make of that? For all I know, that Kanji
character might mean newline, and I'm seeing code for a
version of Python that was tweaked to be friendly to the
Japanese. And in the days where smart hand-held devices are
proliferating like crazy, there might be ever-more demand
for easy-to-use i/o that lets you control various aspects of
those devices.

|>ouglas

Steven D'Aprano

unread,

Aug 15, 2009, 10:19:02 PM8/15/09

to

On Sat, 15 Aug 2009 13:01:43 -0700, Douglas Alan wrote:

> P.S. I find it strange, however, that anyone who is not okay with
> "abusing" operator overloading in this manner, wouldn't also take
> umbrage at Python's overloading of "+" to work with strings and lists,
> etc. Numerical addition and sequence concatenation have entirely
> different semantics.

Not to English speakers, where we frequently use 'add' to mean
concatenate, append, insert, etc.:

"add this to the end of the list"
"add the prefix 'un-' to the beginning of the word to negate it"
"add your voice to the list of those calling for change"
"add your name and address to the visitor's book"

and even in-place modifications:

"after test audiences' luke-warm response, the studio added a completely
different ending to the movie".

Personally, I would have preferred & for string and list concatenation,
but that's entirely for subjective reasons.

--
Steven

Douglas Alan

unread,

Aug 15, 2009, 11:00:23 PM8/15/09

to

On Aug 15, 10:19 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> On Sat, 15 Aug 2009 13:01:43 -0700, Douglas Alan wrote:
> > P.S. I find it strange, however, that anyone who is not okay with
> > "abusing" operator overloading in this manner, wouldn't also take
> > umbrage at Python's overloading of "+" to work with strings and lists,
> > etc. Numerical addition and sequence concatenation have entirely
> > different semantics.
>
> Not to English speakers, where we frequently use 'add' to mean
> concatenate, append, insert, etc.:

That is certainly true, but the "+" symbol (pronounced "plus" not
"add") isn't exactly synonymous with the English word "add" and is
usually used in, technical circles, to refer to a function that at
least meets the properties of an abelian group operator.

Also, programming languages (other than Perl) should be more precise
than English. English words often have many, many meanings, but when
we are talking about types and operations on types, the operations
should generally have more specific semantics.

In any case, let's say we grant that operators should be allowed to be
as sloppy as English. Then we should have no problem with C++'s use of
"<<" for i/o. Pseudo-code has a long heritage of using "<-" to
indicate assignment, and there are a number of programming language
(e.g., APL) that use assignment to the output terminal to indicate
writing to the terminal. C++'s usage of "<<" for output is clearly
designed to be reminiscent of this, and therefore intuitive.

And intuitive it is, given the aforementioned background, at least.

So, as far as I can tell, Python has no real authority to throw stones
at C++ on this little tiny particular issue.

|>ouglas

Steven D'Aprano

unread,

Aug 16, 2009, 1:05:01 AM8/16/09

to

On Sat, 15 Aug 2009 20:00:23 -0700, Douglas Alan wrote:

> So, as far as I can tell, Python has no real authority to throw stones
> at C++ on this little tiny particular issue.

I think you're being a tad over-defensive. I asked a genuine question
about a quote in somebody's signature. That's a quote which can be found
all over the Internet, and the poster using it has (as far as I know) no
official capacity to speak for "Python" -- while Aahz is a high-profile,
well-respected Pythonista, he's not Guido.

Now that I understand what the semantics of cout << "Hello world" are, I
don't have any problem with it either. It is a bit weird, "Hello world"
>> cout would probably be better, but it's hardly the strangest design in
any programming language, and it's probably influenced by input
redirection using < in various shells.

--
Steven

Douglas Alan

unread,

Aug 16, 2009, 1:51:07 AM8/16/09

to Dark Water

On Aug 16, 1:05 am, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> On Sat, 15 Aug 2009 20:00:23 -0700, Douglas Alan wrote:
> > So, as far as I can tell, Python has no real authority to throw stones
> > at C++ on this little tiny particular issue.

> I think you're being a tad over-defensive.

Defensive? Personally, I prefer Python over C++ by about a factor of
100X. I just find it a bit amusing when someone claims that some
programming language has a particular fatal flaw, when their own
apparently favorite language has the very same issue in an only
slightly different form.

> the poster using it has (as far as I know) no official capacity to speak
> for "Python"

I never thought he did. I wasn't speaking literally, as I'm not under
the opinion that any programming language has any literal authority or
any literal ability to throw stones.

> Now that I understand what the semantics of cout << "Hello world" are, I
> don't have any problem with it either. It is a bit weird, "Hello world">> cout
> would probably be better, but it's hardly the strangest design in
> any programming language, and it's probably influenced by input
> redirection using < in various shells.

C++ also allows for reading from stdin like so:

cin >> myVar;

I think the direction of the arrows probably derives from languages
like APL, which had notation something like so:

myVar <- 3
[] <- myVar

"<-" was really a little arrow symbol (APL didn't use ascii), and the
first line above would assign the value 3 to myVar. In the second
line, the "[]" was really a little box symbol and represented the
terminal. Assigning to the box would cause the output to be printed
on the terminal, so the above would output "3". If you did this:

[] -> myVar

It would read a value into myVar from the terminal.

APL predates Unix by quite a few years.

|>ouglas

Hendrik van Rooyen

unread,

Aug 16, 2009, 3:24:36 AM8/16/09

to pytho...@python.org

>"Steven D'Aprano" <steve@REMOVE-THIS-c...e.com.au> wrote:

>Now that I understand what the semantics of cout << "Hello world" are, I
>don't have any problem with it either. It is a bit weird, "Hello world"
>>> cout would probably be better, but it's hardly the strangest design in
>any programming language, and it's probably influenced by input
>redirection using < in various shells.

I find it strange that you would prefer:

"Hello world" >> cout
over:
cout << "Hello world"

The latter seems to me to be more in line with normal assignment: -
Take what is on the right and make the left the same.
I suppose it is because we read from left to right that the first one seems
better to you.
Another instance of how different we all are.

It goes down to the assembler - there are two schools:

mov a,b - for Intel like languages, this means move b to a
mov a,b - for Motorola like languages, this means move a to b

Gets confusing sometimes.

- Hendrik

Steven D'Aprano

unread,

Aug 16, 2009, 4:22:58 AM8/16/09

to

On Sun, 16 Aug 2009 09:24:36 +0200, Hendrik van Rooyen wrote:

>>"Steven D'Aprano" <steve@REMOVE-THIS-c...e.com.au> wrote:
>
>>Now that I understand what the semantics of cout << "Hello world" are, I
>>don't have any problem with it either. It is a bit weird, "Hello world"
>>>> cout would probably be better, but it's hardly the strangest design
>>>> in
>>any programming language, and it's probably influenced by input
>>redirection using < in various shells.
>
> I find it strange that you would prefer:
>
> "Hello world" >> cout
> over:
> cout << "Hello world"
>
> The latter seems to me to be more in line with normal assignment: - Take
> what is on the right and make the left the same.

I don't like normal assignment. After nearly four decades of mathematics
and programming, I'm used to it, but I don't think it is especially good.
It confuses beginners to programming: they get one set of behaviour
drilled into them in maths class, and then in programming class we use
the same notation for something which is almost, but not quite, the same.
Consider the difference between:

y = 3 + x
x = z

as a pair of mathematics expressions versus as a pair of assignments.
What conclusion can you draw about y and z?

Even though it looks funny due to unfamiliarity, I'd love to see the
results of a teaching language that used notation like:

3 + x -> y
len(alist) -> n
Widget(1, 2, 3).magic -> obj
etc.

for assignment. My prediction is that it would be easier to learn, and
just as good for experienced coders. The only downside (apart from
unfamiliarity) is that it would be a little bit harder to find the
definition of a variable by visually skimming lines of code: your eyes
have to zig-zag back and forth to find the end of the line, instead of
running straight down the left margin looking for "myvar = ...". But it
should be easy enough to search for "-> myvar".

> I suppose it is because
> we read from left to right that the first one seems better to you.

Probably.

--
Steven

Douglas Alan

unread,

Aug 16, 2009, 4:41:41 AM8/16/09

to

On Aug 16, 4:22 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> I don't like normal assignment. After nearly four decades of mathematics
> and programming, I'm used to it, but I don't think it is especially good.
> It confuses beginners to programming: they get one set of behaviour
> drilled into them in maths class, and then in programming class we use
> the same notation for something which is almost, but not quite, the same.
> Consider the difference between:
>
> y = 3 + x
> x = z
>
> as a pair of mathematics expressions versus as a pair of assignments.
> What conclusion can you draw about y and z?

Yeah, the syntax most commonly used for assignment today sucks. In the
past, it was common to see languages with syntaxes like

y <- y + 1

or

y := y + 1

or

let y = y + 1

But these languages have mostly fallen out of favor. The popular
statistical programming language R still uses the

y <- y + 1

syntax, though.

Personally, my favorite is Lisp, which looks like

(set! y (+ y 1))

or

(let ((x 3)
(y 4))
(foo x y))

I like to be able to read everything from left to right, and Lisp does
that more than any other programming language.

I would definitely not like a language that obscures assignment by
moving it over to the right side of lines.

|>ouglas

Erik Max Francis

unread,

Aug 16, 2009, 4:46:11 AM8/16/09

to

Steven D'Aprano wrote:
> I don't like normal assignment. After nearly four decades of mathematics
> and programming, I'm used to it, but I don't think it is especially good.
> It confuses beginners to programming: they get one set of behaviour
> drilled into them in maths class, and then in programming class we use
> the same notation for something which is almost, but not quite, the same.
> Consider the difference between:
>
> y = 3 + x
> x = z
>
> as a pair of mathematics expressions versus as a pair of assignments.
> What conclusion can you draw about y and z?

What you're saying is true, but it's still a matter of terminology. The
symbol "=" means different things in different contexts, and mathematics
and programming are very different ones indeed. The problem is
compounded with early languages which lazily confused the two in
different context, such as (but not exclusive to) BASIC using = for both
assignment and equality testing in what are in esssence totally
unrelated contexts.

> Even though it looks funny due to unfamiliarity, I'd love to see the
> results of a teaching language that used notation like:
>
> 3 + x -> y
> len(alist) -> n
> Widget(1, 2, 3).magic -> obj
> etc.
>
> for assignment. My prediction is that it would be easier to learn, and
> just as good for experienced coders.

This really isn't new at all. Reverse the arrow and the relationship to
get::

y <- x + 3

(and use a real arrow rather than ASCII) and that's assignment in APL
and a common representation in pseudocode ever since. Change it to :=
and that's what Pascal used, as well as quite a few mathematical papers
dealing with iterative computations, I might add.

Once you get past the point of realizing that you really need to make a
distinction between assignment and equality testing, then it's just a
matter of choosing two different operators for the job. Whether it's
<-/= or :=/= or =/== or ->/= (with reversed behavior for assignment) is
really academic and a matter of taste at that point.

Given the history of programming languages, it doesn't really look like
the to-be-assigned variable being at the end of expression is going to
get much play, since not a single major one I'm familiar with does it
that way, and a lot of them have come up with the same convention
independently and haven't seen a need to change.

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM/Y!M/Skype erikmaxfrancis

Get there first with the most men.
-- Gen. Nathan Bedford Forrest, 1821-1877

Erik Max Francis

unread,

Aug 16, 2009, 4:48:15 AM8/16/09

to

Douglas Alan wrote:
> Personally, my favorite is Lisp, which looks like
>
> (set! y (+ y 1))

For varying values of "Lisp." `set!` is Scheme.

Douglas Alan

unread,

Aug 16, 2009, 5:31:55 AM8/16/09

to

On Aug 16, 4:48 am, Erik Max Francis <m...@alcyone.com> wrote:
> Douglas Alan wrote:
> > Personally, my favorite is Lisp, which looks like
>
> > (set! y (+ y 1))
>
> For varying values of "Lisp." `set!` is Scheme.

Yes, I'm well aware!

There are probably as many different dialects of Lisp as all other
programming languages put together.

|>ouglas

Steven D'Aprano

unread,

Aug 16, 2009, 6:18:11 AM8/16/09

to

On Sun, 16 Aug 2009 01:41:41 -0700, Douglas Alan wrote:

> I like to be able to read everything from left to right, and Lisp does
> that more than any other programming language.
>
> I would definitely not like a language that obscures assignment by
> moving it over to the right side of lines.

One could argue that left-assigned-from-right assignment obscures the
most important part of the assignment, namely *what* you're assigning, in
favour of what you're assigning *to*.

In any case, after half a century of left-from-right assignment, I think
it's worth the experiment in a teaching language or three to try it the
other way. The closest to this I know of is the family of languages
derived from Apple's Hypertalk, where you do assignment with:

put somevalue into name

(Doesn't COBOL do something similar?)

Beginners found that *very* easy to understand, and it didn't seem to
make coding harder for experienced Hypercard developers.

--
Steven

Hendrik van Rooyen

unread,

Aug 16, 2009, 7:42:14 AM8/16/09

to pytho...@python.org

On Sunday 16 August 2009 12:18:11 Steven D'Aprano wrote:

> In any case, after half a century of left-from-right assignment, I think
> it's worth the experiment in a teaching language or three to try it the
> other way. The closest to this I know of is the family of languages
> derived from Apple's Hypertalk, where you do assignment with:
>
> put somevalue into name
>
> (Doesn't COBOL do something similar?)

Yup.

move banana to pineapple.

move accountnum in inrec to accountnum in outrec.

move corresponding inrec to outrec.

It should all be upper case of course...

I cannot quite recall, but I have the feeling that in the second form, "of"
was also allowed instead of "in", but it has been a while now so I am
probably wrong.

The move was powerful - it would do conversions for you based on the types of
the operands - it all "just worked".

- Hendrik

MRAB

unread,

Aug 16, 2009, 8:45:22 AM8/16/09

to pytho...@python.org

Douglas Alan wrote:
[snip]

> C++ also allows for reading from stdin like so:
>
> cin >> myVar;
>
> I think the direction of the arrows probably derives from languages
> like APL, which had notation something like so:
>
> myVar <- 3
> [] <- myVar
>
> "<-" was really a little arrow symbol (APL didn't use ascii), and the
> first line above would assign the value 3 to myVar. In the second
> line, the "[]" was really a little box symbol and represented the
> terminal. Assigning to the box would cause the output to be printed
> on the terminal, so the above would output "3". If you did this:
>
> [] -> myVar
>
> It would read a value into myVar from the terminal.
>
> APL predates Unix by quite a few years.
>

No, APL is strictly right-to-left.

-> x

means "goto x".

Writing to the console is:

[] <- myVar

Reading from the console is:

myVar <- []

Douglas Alan

unread,

Aug 16, 2009, 1:46:37 PM8/16/09

to

On Aug 16, 8:45 am, MRAB <pyt...@mrabarnett.plus.com> wrote:

> No, APL is strictly right-to-left.
>
> -> x
>
> means "goto x".
>
> Writing to the console is:
>
> [] <- myVar
>
> Reading from the console is:
>
> myVar <- []

Ah, thanks for the correction. It's been 5,000 years since I used APL!

|>ouglas

Douglas Alan

unread,

Aug 16, 2009, 2:55:31 PM8/16/09

to

On Aug 16, 6:18 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> On Sun, 16 Aug 2009 01:41:41 -0700, Douglas Alan wrote:

> > I would definitely not like a language that obscures assignment by
> > moving it over to the right side of lines.

> One could argue that left-assigned-from-right assignment obscures the
> most important part of the assignment, namely *what* you're assigning, in
> favour of what you're assigning *to*.

The most important things are always the side-effects and the name-
bindings.

In a large program, it can be difficult to figure out where a name is
defined, or which version of a name a particular line of code is
seeing. Consequently languages should always go out of their way to
make tracking this as easy as possible.

Side effects are also a huge issue, and a source of many bugs. This is
one of the reasons that that are many functional languages that
prohibit or discourage side-effects. Side effects should be made as
obvious as is feasible.

This is why, for instance, in Scheme, variable assignment as an
exclamation mark in it. E.g.,

(set! x (+ x 1))

The exclamation mark is to make the fact that a side effect is
happening there stand out and be immediately apparent. And C++
provides the "const" declaration for similar reasons.

> In any case, after half a century of left-from-right assignment, I think
> it's worth the experiment in a teaching language or three to try it the
> other way. The closest to this I know of is the family of languages
> derived from Apple's Hypertalk, where you do assignment with:
>
> put somevalue into name

That's okay with me, but only because the statement begins with "put",
which lets you know at the very beginning of the line that something
very important is happening. You don't have to scan all the way to the
right before you notice.

Still, I would prefer

let name = somevalue

as the "let" gives me the heads up right away, and then immediately
after the "let" is the name that I might want to be able to scan for
quickly.

|>ouglas

Message has been deleted

Nobody

unread,

Aug 16, 2009, 10:21:43 PM8/16/09

to

On Sun, 16 Aug 2009 05:05:01 +0000, Steven D'Aprano wrote:

> Now that I understand what the semantics of cout << "Hello world" are, I
> don't have any problem with it either. It is a bit weird, "Hello world"
> >> cout would probably be better,

Placing the stream on the LHS allows the main forms of << to be
implemented as methods of the ostream class. C++ only considers the LHS
operand when attempting to resolve an infix operator as a method.

Also, << and >> are left-associative, and that cannot be changed by
overloading. Having the ostream on the LHS allows the operators to be
chained:

cout << "Hello" << ", " << "world" << endl

equivalent to:

(((cout << "Hello") << ", ") << "world") << endl

[operator<< returns the ostream as its result.]

Even if you could make >> right-associative, the values would have to be
written right-to-left:

endl >> "world" >> ", " >> "Hello" >> cout
i.e.:
endl >> ("world" >> (", " >> ("Hello" >> cout)))