A friend of mine is just learning Python, and he's a bit tweaked about how unrecognized escape sequences are treated in Python. This is from the Python 3.0 reference manual:
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals.
My friend begs to differ with the above. It would be much better for debugging if Python generated a parsing error for unrecognized escape sequences, rather than leaving them unchanged. g++ outputs a warning for such escape sequences, for instance. This is what I would consider to be the correct behavior. (Actually, I think it should just generate a fatal parsing error, but a warning is okay too.)
In any case, I think my friend should mellow out a bit, but we both consider this something of a wart. He's just more wart-phobic than I am. Is there any way that this behavior can be considered anything other than a wart? Other than the unconvincing claim that you can use this "feature" to save you a bit of typing sometimes when you actually want a backslash to be in your string?
On Sun, 09 Aug 2009 12:26:54 -0700, Douglas Alan wrote: > A friend of mine is just learning Python, and he's a bit tweaked about > how unrecognized escape sequences are treated in Python. ... > In any case, I think my friend should mellow out a bit, but we both > consider this something of a wart. He's just more wart-phobic than I am. > Is there any way that this behavior can be considered anything other > than a wart? Other than the unconvincing claim that you can use this > "feature" to save you a bit of typing sometimes when you actually want a > backslash to be in your string?
I'd put it this way: a backslash is just an ordinary character, except when it needs to be special. So Python's behaviour is "treat backslash as a normal character, except for these exceptions" while the behaviour your friend wants is "treat a backslash as an error, except for these exceptions".
Why should a backslash in a string literal be an error?
Steven D'Aprano wrote: > Why should a backslash in a string literal be an error?
Because in Python, if my friend sees the string "foo\xbar\n", he has no idea whether the "\x" is an escape sequence, or if it is just the characters "\x", unless he looks it up in the manual, or tries it out in the REPL, or what have you. My friend is adamant that it would be better if he could just look at the string literal and know. He doesn't want to be bothered to have to store stuff like that in his head. He wants to be able to figure out programs just by looking at them, to the maximum degree that that is feasible.
In comparison to Python, in C++, he can just look "foo\xbar\n" and know that "\x" is a special character. (As long as it compiles without warnings under g++.)
He's particularly annoyed too, that if he types "foo\xbar" at the REPL, it echoes back as "foo\\xbar". He finds that to be some sort of annoying DWIM feature, and if Python is going to have DWIM features, then it should, for example, figure out what he means by "\" and not bother him with a syntax error in that case.
Another reason that Python should not behave the way that it does, is that it pegs Python into a corner where it can't add new escape sequences in the future, as doing so will break existing code. Generating a syntax error instead for unknown escape sequences would allow for future extensions.
Now not to pick on Python unfairly, most other languages have similar issues with escape sequences. (Except for the Bourne Shell and bash, where "\x" always just means "x", no matter what character "x" happens to be.) But I've been telling my friend for years to switch to Python because of how wonderful and consistent Python is in comparison to most other languages, and now he seems disappointed and seems to think that Python is just more of the same.
Of course I think that he's overreacting a bit. My point of view is that every language has *some* warts; Python just has a bit fewer than most. It would have been nice, I should think, if this wart had been "fixed" in Python 3, as I do consider it to be a minor wart.
cybersource.com.au> wrote: > On Sun, 09 Aug 2009 12:26:54 -0700, Douglas Alan wrote: > > A friend of mine is just learning Python, and he's a bit tweaked about > > how unrecognized escape sequences are treated in Python. > ... > > In any case, I think my friend should mellow out a bit, but we both > > consider this something of a wart. He's just more wart-phobic than I am. > > Is there any way that this behavior can be considered anything other > > than a wart? Other than the unconvincing claim that you can use this > > "feature" to save you a bit of typing sometimes when you actually want a > > backslash to be in your string?
> I'd put it this way: a backslash is just an ordinary character, except > when it needs to be special. So Python's behaviour is "treat backslash as > a normal character, except for these exceptions" while the behaviour your > friend wants is "treat a backslash as an error, except for these > exceptions".
> Why should a backslash in a string literal be an error?
Because the behavior of \ in a string is context-dependent, which means a reader can't know if \ is a literal character or escape character without knowing the context, and it means an innocuous change in context can cause a rather significant change in \.
IOW it's an error-prone mess. It would be better if Python (like C) treated \ consistently as an escape character. (And in raw strings, consistently as a literal.)
It's kind of a minor issue in terms of overall real-world importance, but in terms of raw unPythonicness this might be the worst offense the language makes.
> while the behaviour your > friend wants is "treat a backslash as an error, except for these > exceptions".
Besides, can't all error situations be described as, "treat the error situation as an error, except for the exception of when the situation isn't an error"???
The behavior my friend wants isn't any more exceptional than that!
Carl Banks wrote: > IOW it's an error-prone mess. It would be better if Python (like C) > treated \ consistently as an escape character. (And in raw strings, > consistently as a literal.)
Agreed. For one thing, if another escape character ever has to be added to the language, that may change the semantics of previously correct strings. If "\" followed by a non-special character is treated as an error, that doesn't happen.
On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote: > Steven D'Aprano wrote:
>> Why should a backslash in a string literal be an error?
> Because in Python, if my friend sees the string "foo\xbar\n", he has no > idea whether the "\x" is an escape sequence, or if it is just the > characters "\x", unless he looks it up in the manual, or tries it out in > the REPL, or what have you.
Fair enough, but isn't that just another way of saying that if you look at a piece of code and don't know what it does, you don't know what it does unless you look it up or try it out?
> My friend is adamant that it would be better > if he could just look at the string literal and know. He doesn't want to > be bothered to have to store stuff like that in his head. He wants to be > able to figure out programs just by looking at them, to the maximum > degree that that is feasible.
I actually sympathize strongly with that attitude. But, honestly, your friend is a programmer (or at least pretends to be one *wink*). You can't be a programmer without memorizing stuff: syntax, function calls, modules to import, quoting rules, blah blah blah. Take C as an example -- there's absolutely nothing about () that says "group expressions or call a function" and {} that says "group a code block". You just have to memorize it. If you don't know what a backslash escape is going to do, why would you use it? I'm sure your friend isn't in the habit of randomly adding backslashes to strings just to see whether it will still compile.
This is especially important when reading (as opposed to writing) code. You read somebody else's code, and see "foo\xbar\n". Let's say you know it compiles without warning. Big deal -- you don't know what the escape codes do unless you've memorized them. What does \n resolve to? chr(13) or chr(97) or chr(0)? Who knows?
Unless you know the rules, you have no idea what is in the string. Allowing \y to resolve to a literal backslash followed by y doesn't change that. All it means is that some \c combinations return a single character, and some return two.
> In comparison to Python, in C++, he can just look "foo\xbar\n" and know > that "\x" is a special character. (As long as it compiles without > warnings under g++.)
So what you mean is, he can just look at "foo\xbar\n" AND COMPILE IT USING g++, and know whether or not \x is a special character.
[sarcasm] Gosh. That's an enormous difference from Python, where you have to print the string at the REPL to know what it does. [/sarcasm]
> He's particularly annoyed too, that if he types "foo\xbar" at the REPL, > it echoes back as "foo\\xbar". He finds that to be some sort of annoying > DWIM feature, and if Python is going to have DWIM features, then it > should, for example, figure out what he means by "\" and not bother him > with a syntax error in that case.
Now your friend is confused. This is a good thing. Any backslash you see in Python's default string output is *always* an escape:
>>> "a string with a 'proper' escape \t (tab)"
"a string with a 'proper' escape \t (tab)"
>>> "a string with an 'improper' escape \y (backslash-y)"
"a string with an 'improper' escape \\y (backslash-y)"
The REPL is actually doing him a favour. It always escapes backslashes, so there is no ambiguity. A backslash is displayed as \\, any other \c is a special character.
> Of course I think that he's overreacting a bit.
:)
> My point of view is that > every language has *some* warts; Python just has a bit fewer than most. > It would have been nice, I should think, if this wart had been "fixed" > in Python 3, as I do consider it to be a minor wart.
And if anyone had cared enough to raise it a couple of years back, it possibly might have been.
On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote: >> Why should a backslash in a string literal be an error?
> Because the behavior of \ in a string is context-dependent, which means > a reader can't know if \ is a literal character or escape character > without knowing the context, and it means an innocuous change in context > can cause a rather significant change in \.
*Any* change in context is significant with escapes.
"this \nhas two lines"
If you change the \n to a \t you get a significant difference. If you change the \n to a \y you get a significant difference. Why is the first one acceptable but the second not?
> IOW it's an error-prone mess.
I've never had any errors caused by this. I've never seen anyone write to this newsgroup confused over escape behaviour, or asking for help with an error caused by it, and until this thread, never seen anyone complain about it either.
Excuse my cynicism, but I believe that you are using "error-prone" to mean "I don't like this behaviour" rather than "it causes lots of errors".
On Sun, 09 Aug 2009 23:03:14 -0700, John Nagle wrote: > if another escape character ever has to be > added to the language, that may change the semantics of previously > correct strings.
And that's the only argument in favour of prohibiting non-special backslash sequences I've seen yet that is even close to convincing.
<ste...@REMOVE.THIS.cybersource.com.au> wrote: > On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote: > > Because in Python, if my friend sees the string "foo\xbar\n", he has no > > idea whether the "\x" is an escape sequence, or if it is just the > > characters "\x", unless he looks it up in the manual, or tries it out in > > the REPL, or what have you.
> Fair enough, but isn't that just another way of saying that if you look > at a piece of code and don't know what it does, you don't know what it > does unless you look it up or try it out?
Not really. It's more like saying that easy things should be easy, and hard things should possible. But in this case, Python is making something that should be really easy, a bit harder and more error prone than it should be.
In C++, if I know that the code I'm looking at compiles, then I never need worry that I've misinterpreted what a string literal means. At least not if it doesn't have any escape characters in it that I'm not familiar with. But in Python, if I see, "\f\o\o\b\a\z", I'm not really sure what I'm seeing, as I surely don't have committed to memory some of the more obscure escape sequences. If I saw this in C++, and I knew that it was in code that compiled, then I'd at least know that there are some strange escape codes that I have to look up. Unlike with Python, it would never be the case in C++ code that the programmer who wrote the code was just too lazy to type in "\\f\\o\\o\\b\\a\\z" instead.
> > My friend is adamant that it would be better > > if he could just look at the string literal and know. He doesn't want to > > be bothered to have to store stuff like that in his head. He wants to be > > able to figure out programs just by looking at them, to the maximum > > degree that that is feasible.
> I actually sympathize strongly with that attitude. But, honestly, your > friend is a programmer (or at least pretends to be one *wink*).
Actually, he's probably written more code than you, me, and ten other random decent programmers put together. As he can slap out massive amounts of code very quickly, he'd prefer not to have crap getting in his way. In the time it takes him to look something up, he might have written another page of code.
He's perfectly capable of dealing with crap, as years of writing large programs in Perl and PHP quickly proves, but his whole reason for learning Python, I take it, is so that he will be bothered with less crap and therefore write code even faster.
> You can't be a programmer without memorizing stuff: syntax, function > calls, modules to import, quoting rules, blah blah blah. Take C as > an example -- there's absolutely nothing about () that says "group > expressions or call a function" and {} that says "group a code > block".
I don't really think that this is a good analogy. It's like the difference between remembering rules of grammar and remembering English spelling. As a kid, I was the best in my school at grammar, and one of the worst at speling.
> You just have to memorize it. If you don't know what a backslash > escape is going to do, why would you use it?
(1) You're looking at code that someone else wrote, or (2) you forget to type "\\" instead of "\" in your code (or get lazy sometimes), as that is okay most of the time, and you inadvertently get a subtle bug.
> This is especially important when reading (as opposed to writing) code. > You read somebody else's code, and see "foo\xbar\n". Let's say you know > it compiles without warning. Big deal -- you don't know what the escape > codes do unless you've memorized them. What does \n resolve to? chr(13) > or chr(97) or chr(0)? Who knows?
It *is* a big deal. Or at least a non-trivial deal. It means that you can tell just by looking at the code that there are funny characters in the string, and not just a backslashes. You don't have to go running for the manual every time you see code with backslashes, where the upshot might be that the programmer was merely saving themselves some typing.
> > In comparison to Python, in C++, he can just look "foo\xbar\n" and know > > that "\x" is a special character. (As long as it compiles without > > warnings under g++.)
> So what you mean is, he can just look at "foo\xbar\n" AND COMPILE IT > USING g++, and know whether or not \x is a special character.
I'm not sure that your comments are paying due diligence to full life-cycle software development issues that involve multiple programmers (or even just your own program that you wrote a year ago, and you don't remember all the details of what you did) combined with maintaining and modifying existing code, etc.
> Aside: > \x isn't a special character:
> >>> "\x"
> ValueError: invalid \x escape
I think that this all just goes to prove my friend's point! Here I've been programming in Python for more than a decade (not full time, mind you, as I also program in other languages, like C++), and even I didn't know that "\xba" was an escape sequence, and I inadvertently introduced a subtle bug into my argument because it just so happens that the first two characters of "bar" are legal hexadecimal! If I did the very same thing in a real program, it might take me a lot of time to track down the bug.
Also, it seems that Python is being inconsistent here. Python knows that the string "\x" doesn't contain a full escape sequence, so why doesn't it treat the string "\x" the same way that it treats the string "\z"? After all, if you're a Python programmer, you should know that "\x" doesn't contain a complete escape sequence, and therefore, you would not be surprised if Python were so kind as to just leave it alone, rather than raising a ValueError.
I.e., "\z" is not a legal escape sequence, so it gets left as "\\z". "\x" is not a legal escape sequence. Shouldn't it also get left as "\\x"?
> > He's particularly annoyed too, that if he types "foo\xbar" at the REPL, > > it echoes back as "foo\\xbar". He finds that to be some sort of annoying > > DWIM feature, and if Python is going to have DWIM features, then it > > should, for example, figure out what he means by "\" and not bother him > > with a syntax error in that case.
> Now your friend is confused. This is a good thing. Any backslash you see > in Python's default string output is *always* an escape:
Well, I think he's more annoyed that if Python is going to be so helpful as to put in the missing "\" for you in "foo\zbar", then it should put in the missing "\" for you in "\". He considers this to be an inconsistency.
Me, I'd never, ever, EVER want a language to special-case something at the end of a string, but I can see that from his new-to-Python perspective, Python seems to be DWIMing in one place and not the other, and he thinks that it should either do no DWIMing at all, or consistently DWIM. To not be consistent in this regard is "inelegant", says he.
And I can see his point that allowing "foo\zbar" and "foo\\zbar" to be synonymous is a form of DWIMing.
> > My point of view is that every language has *some* warts; Python > > just has a bit fewer than most. It would have been nice, I should > > think, if this wart had been "fixed" in Python 3, as I do consider > > it to be a minor wart. > And if anyone had cared enough to raise it a couple of years back, it > possibly might have been.
So, now if only my friend had learned Python years ago, when I told him to, he possibly might be happy with Python by now!
<ste...@REMOVE.THIS.cybersource.com.au> wrote: > On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote: > >> Why should a backslash in a string literal be an error?
> > Because the behavior of \ in a string is context-dependent, which means > > a reader can't know if \ is a literal character or escape character > > without knowing the context, and it means an innocuous change in context > > can cause a rather significant change in \.
> *Any* change in context is significant with escapes.
> "this \nhas two lines"
> If you change the \n to a \t you get a significant difference. If you > change the \n to a \y you get a significant difference. Why is the first > one acceptable but the second not?
Because when you change \n to \t, you've haven't changed the meaning of the \ character; but when you change \n to \y, you have, and you did so without even touching the backslash.
> > IOW it's an error-prone mess.
> I've never had any errors caused by this.
Thank you for your anecdotal evidence. Here's mine: This has gotten me at least twice, and a compiler complaint would have reduced my bug- hunting time from tens of minutes to ones of seconds. [Aside: it was when I was using Python on Windows for the first time]
> I've never seen anyone write to > this newsgroup confused over escape behaviour, or asking for help with an > error caused by it, and until this thread, never seen anyone complain > about it either.
More anecdotal evidence. Here's mine: I have.
> Excuse my cynicism, but I believe that you are using "error-prone" to > mean "I don't like this behaviour" rather than "it causes lots of errors".
No, I'm using error-prone to mean error-prone.
Someone (obviously not you because you're have perfect knowledge of the language and 100% situation awareness at all times) might have a string like "abcd\stuv" and change it to "abcd\tuvw" without even thinking about the fact that the s comes after the backslash.
Worst of all: they might not even notice the error, because the repr of this string is:
'abcd\tuwv'
They might not notice that the backslash is single, because (unlike you) mortal fallible human beings don't always register tiny details like a backslash being single when it should be double.
Point is, this is a very bad inconsistency. It makes the behavior of \ impossible to learn by analogy, now you have to memorize a list of situations where it behaves one way or another.
But you've seen an error caused by this, in this very discussion. I.e., "foo\xbar".
"\xba" isn't an escape sequence in any other language that I've used, which is one reason I made this error... Oh, wait a minute -- it *is* an escape sequence in JavaScript. But in JavaScript, while "\xba" is a special character, "\xb" is synonymous with "xb".
The fact that every language seems to treat these things similarly but differently, is yet another reason why they should just be treated utterly consistently by all of the languages: I.e., escape sequences that don't have a special meaning should be an error!
> I've never seen anyone write to > this newsgroup confused over escape behaviour,
My friend objects strongly the claim that he is "confused" by it, so I guess you are right that no one is confused. He just thinks that it violates the beautiful sense of aesthetics that he was sworn over and over again Python to have.
But aesthetics is a non-negligible issue with practical ramifications. (Not that anything can be done about this wart at this point, however.)
> or asking for help with an error caused by it, and until > this thread, never seen anyone complain about it either.
Oh, this bothered me too when I first learned Python, and I thought it was stupid. It just didn't bother me enough to complain publicly.
Besides, the vast majority of Python noobs don't come here, despite appearance sometimes, and by the time most people get here, they've probably got bigger fish to fry.
On Mon, 10 Aug 2009 00:37:33 -0700, Carl Banks wrote: > On Aug 9, 11:10 pm, Steven D'Aprano > <ste...@REMOVE.THIS.cybersource.com.au> wrote: >> On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote: >> >> Why should a backslash in a string literal be an error?
>> > Because the behavior of \ in a string is context-dependent, which >> > means a reader can't know if \ is a literal character or escape >> > character without knowing the context, and it means an innocuous >> > change in context can cause a rather significant change in \.
>> *Any* change in context is significant with escapes.
>> "this \nhas two lines"
>> If you change the \n to a \t you get a significant difference. If you >> change the \n to a \y you get a significant difference. Why is the >> first one acceptable but the second not?
> Because when you change \n to \t, you've haven't changed the meaning of > the \ character;
I assume you mean the \ character in the literal, not the (non-existent) \ character in the string.
> but when you change \n to \y, you have, and you did so > without even touching the backslash.
Not at all.
'\n' maps to the string chr(10). '\y' maps to the string chr(92) + chr(121).
In both cases the backslash in the literal have the same meaning: grab the next token (usually a single character, but not always), look it up in a mapping somewhere, and insert the result in the string object being built.
(I don't know if the *implementation* is precisely as described, but that's irrelevant. It's still functionally a mapping.)
>> > IOW it's an error-prone mess.
>> I've never had any errors caused by this.
> Thank you for your anecdotal evidence. Here's mine: This has gotten me > at least twice, and a compiler complaint would have reduced my bug- > hunting time from tens of minutes to ones of seconds. [Aside: it was > when I was using Python on Windows for the first time]
Okay, that's twice in, how many years have you been programming?
I've mistyped "xrange" as "xrnage" two or three times. Does that make xrange() "an error-prone mess" too? Probably not. Why is my mistake my mistake, but your mistake the language's fault?
[...]
Oh, wait, no, I tell I lie -- I *have* seen people reporting "bugs" here caused by backslashes. They're invariably Windows programmers writing pathnames using backslashes, so I'll give you that one: if you don't know that Python treats backslashes as special in string literals, you will screw up your Windows pathnames.
Interestingly, the problem there is not that \y resolves to literal backslash followed by y, but that \t DOESN'T resolve to the expected backslash-t. So it seems to me that the problem for Windows coders is not that \y doesn't raise an error, but the mere existence of backslash escapes.
> Someone (obviously not you because you're have perfect knowledge of the > language and 100% situation awareness at all times) might have a string > like "abcd\stuv" and change it to "abcd\tuvw" without even thinking > about the fact that the s comes after the backslash.
Deary me. And they might type "4+15" instead of "4*51", and now arithmetic is an "error-prone mess" too. If you know of a programming language which can prevent you making semantic errors, please let us all know what it is.
If you edit code without thinking, you will be burnt, and you get *zero* sympathy from me.
> Worst of all: they might not even notice the error, because the repr of > this string is:
> 'abcd\tuwv'
> They might not notice that the backslash is single, because (unlike you) > mortal fallible human beings don't always register tiny details like a > backslash being single when it should be double.
"Help help, 123145 looks too similar to 1231145, and now I calculated my taxes wrong and will go to jail!!!"
> Point is, this is a very bad inconsistency. It makes the behavior of \ > impossible to learn by analogy, now you have to memorize a list of > situations where it behaves one way or another.
No, you don't "have" to memorize anything, you can go right ahead and escape every backslash, as I did for years. Your code will still work fine.
You already have to memorize what escape codes return special characters. The only difference is whether you learn "...and everything else raises an exception" or "...and everything else is returned unchanged".
There is at least one good reason for preferring an error, namely that it allows Python to introduce new escape codes without going through a long, slow process. But the rest of these complaints are terribly unconvincing.
On Mon, 10 Aug 2009 00:57:18 -0700, Douglas Alan wrote: > On Aug 10, 2:10 am, Steven D'Aprano
>> I've never had any errors caused by this.
> But you've seen an error caused by this, in this very discussion. I.e., > "foo\xbar".
Your complaint is that "invalid" escapes like \y resolve to a literal backslash-y instead of raising an error. But \xbar doesn't contain an invalid escape, it contains a valid hex escape. Your ignorance that \xHH is a valid hex escape (for suitable hex digits) isn't an example of an error caused by "invalid" escapes like \y.
> "\xba" isn't an escape sequence in any other language that I've used, > which is one reason I made this error... Oh, wait a minute -- it *is* an > escape sequence in JavaScript. But in JavaScript, while "\xba" is a > special character, "\xb" is synonymous with "xb".
> The fact that every language seems to treat these things similarly but > differently, is yet another reason why they should just be treated > utterly consistently by all of the languages: I.e., escape sequences > that don't have a special meaning should be an error!
Perhaps all the other languages should follow Python's lead instead?
Or perhaps they should follow bash's lead, and map \C to C for every character. If there were no special escapes at all, Windows programmers wouldn't keep getting burnt when they write "C:\\Documents\today\foo" and end up with something completely unexpected.
Oh wait, no, that still wouldn't work, because they'd end up with C:\Documentstodayfoo. So copying bash doesn't work.
But copying C will upset the bash coders, because they'll write "some\ file\ with\ spaces" and suddenly their code won't even compile!!!
Seems like no matter what you do, you're going to upset *somebody*.
>> I've never seen anyone write to >> this newsgroup confused over escape behaviour,
> My friend objects strongly the claim that he is "confused" by it, so I > guess you are right that no one is confused. He just thinks that it > violates the beautiful sense of aesthetics that he was sworn over and > over again Python to have.
Douglas Alan <darkwate...@gmail.com> wrote: > "\xba" isn't an escape sequence in any other language that I've used, > which is one reason I made this error... Oh, wait a minute -- it *is* > an escape sequence in JavaScript. But in JavaScript, while "\xba" is a > special character, "\xb" is synonymous with "xb".
"\xba" is an escape sequence in c, c++, c#, python, javascript, perl and probably many others.
"\xb" is an escape sequence in c, c++, c# but not in Python, Javascript, or Perl. Python will throw ValueError if you try to use "\xb" in a string, Javascript simply ignores the backslash.
> The fact that every language seems to treat these things similarly but > differently, is yet another reason why they should just be treated > utterly consistently by all of the languages: I.e., escape sequences > that don't have a special meaning should be an error!
It would be nice if these things were treated consistently, but they aren't and it seems unlikely to change.
On Mon, 10 Aug 2009 00:32:30 -0700, Douglas Alan wrote: > In C++, if I know that the code I'm looking at compiles, then I never > need worry that I've misinterpreted what a string literal means.
If you don't know what your string literals are, you don't know what your program does. You can't expect the compiler to save you from semantic errors. Adding escape codes into the string literal doesn't change this basic truth.
Semantics matters, and unlike syntax, the compiler can't check it. There's a difference between a program that does the equivalent of:
os.system("cp myfile myfile~")
and one which does this
os.system("rm myfile myfile~")
The compiler can't save you from typing 1234 instead of 11234, or 31.45 instead of 3.145, or "My darling Ho" instead of "My darling Jo", so why do you expect it to save you from typing "abc\d" instead of "abc\\d"?
Perhaps it can catch *some* errors of that type, but only at the cost of extra effort required to defeat the compiler (forcing the programmer to type \\d to prevent the compiler complaining about \d). I don't think the benefit is worth the cost. You and your friend do. Who is to say you're right?
> At > least not if it doesn't have any escape characters in it that I'm not > familiar with. But in Python, if I see, "\f\o\o\b\a\z", I'm not really > sure what I'm seeing, as I surely don't have committed to memory some of > the more obscure escape sequences. If I saw this in C++, and I knew that > it was in code that compiled, then I'd at least know that there are some > strange escape codes that I have to look up.
And if you saw that in Python, you'd also know that there are some strange escape codes that you have to look up. Fortunately, in Python, that's really simple:
>>> "\f\o\o\b\a\z"
'\x0c\\o\\o\x08\x07\\z'
Immediately you can see that the \o and \z sequences resolve to themselves, and the \f \b and \a don't.
> Unlike with Python, it > would never be the case in C++ code that the programmer who wrote the > code was just too lazy to type in "\\f\\o\\o\\b\\a\\z" instead.
But if you see "abc\n", you can't be sure whether the lazy programmer intended "abc"+newline, or "abc"+backslash+"n". Either way, the compiler won't complain.
>> You just have to memorize it. If you don't know what a backslash escape >> is going to do, why would you use it?
> (1) You're looking at code that someone else wrote, or (2) you forget to > type "\\" instead of "\" in your code (or get lazy sometimes), as that > is okay most of the time, and you inadvertently get a subtle bug.
The same error can occur in C++, if you intend \\n but type \n by mistake. Or vice versa. The compiler won't save you from that.
>> This is especially important when reading (as opposed to writing) code. >> You read somebody else's code, and see "foo\xbar\n". Let's say you know >> it compiles without warning. Big deal -- you don't know what the escape >> codes do unless you've memorized them. What does \n resolve to? chr(13) >> or chr(97) or chr(0)? Who knows?
> It *is* a big deal. Or at least a non-trivial deal. It means that you > can tell just by looking at the code that there are funny characters in > the string, and not just a backslashes.
I'm not entirely sure why you think that's a big deal. Strictly speaking, there are no "funny characters", not even \0, in Python. They're all just characters. Perhaps the closest is newline (which is pretty obvious).
> You don't have to go running for > the manual every time you see code with backslashes, where the upshot > might be that the programmer was merely saving themselves some typing.
Why do you care if there are "funny characters"?
In C++, if you see an escape you don't recognize, do you care? Do you go running for the manual? If the answer is No, then why do it in Python?
And if the answer is Yes, then how is Python worse than C++?
[...]
> Also, it seems that Python is being inconsistent here. Python knows that > the string "\x" doesn't contain a full escape sequence, so why doesn't > it > treat the string "\x" the same way that it treats the string "\z"? [...] > I.e., "\z" is not a legal escape sequence, so it gets left as "\\z".
No. \z *is* a legal escape sequence, it just happens to map to \z.
If you stop thinking of \z as an illegal escape sequence that Python refuses to raise an error for, the problem goes away. It's a legal escape sequence that maps to backslash + z.
> "\x" is not a legal escape sequence. Shouldn't it also get left as > "\\x"?
No, because it actually is an illegal escape sequence.
>> > He's particularly annoyed too, that if he types "foo\xbar" at the >> > REPL, it echoes back as "foo\\xbar". He finds that to be some sort of >> > annoying DWIM feature, and if Python is going to have DWIM features, >> > then it should, for example, figure out what he means by "\" and not >> > bother him with a syntax error in that case.
>> Now your friend is confused. This is a good thing. Any backslash you >> see in Python's default string output is *always* an escape:
> Well, I think he's more annoyed that if Python is going to be so helpful > as to put in the missing "\" for you in "foo\zbar", then it should put > in the missing "\" for you in "\". He considers this to be an > inconsistency.
(1) There is no missing \ in "foo\zbar".
(2) The problem with "\" isn't a missing backslash, but a missing end- quote.
> Me, I'd never, ever, EVER want a language to special-case something at > the end of a string, but I can see that from his new-to-Python > perspective, Python seems to be DWIMing in one place and not the other, > and he thinks that it should either do no DWIMing at all, or > consistently DWIM. To not be consistent in this regard is "inelegant", > says he.
Python isn't DWIMing here. The rules are simple and straightforward, there's no mind-reading or guessing required. There is no heuristic trying to predict what the user intends. It's a simple rule:
When parsing a string literal (apart from raw strings), if you see a backslash, then grab the next token (usually a single character, but for \x and \0 it could be multiple characters). If there is a mapping available for that token, insert that in the string being built, and if not, insert the backslash and the token.
(As I said earlier, this may not be precisely how it is implemented, but functionally, it is what Python does.)
> And I can see his point that allowing "foo\zbar" and "foo\\zbar" to be > synonymous is a form of DWIMing.
Is it "a form of DWIMing" to consider 1.234e1 and 12.34 synonymous?
What about 86 and 0x44? Is that DWIMing?
I'm sure both you and your friend are excellent programmers, but you're tossing around DWIM as a meaningless term of opprobrium without any apparent understand of what DWIM actually is.
Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote: > Or perhaps they should follow bash's lead, and map \C to C for every > character. If there were no special escapes at all, Windows > programmers wouldn't keep getting burnt when they write > "C:\\Documents\today\foo" and end up with something completely > unexpected.
> Oh wait, no, that still wouldn't work, because they'd end up with > C:\Documentstodayfoo. So copying bash doesn't work.
There is of course no problem at all so long as you stick to writing your paths as MS intended them to be written: 8.3 and UPPERCASE
Steven D'Aprano wrote: > On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:
[snip] >> My point of view is that >> every language has *some* warts; Python just has a bit fewer than most. >> It would have been nice, I should think, if this wart had been "fixed" >> in Python 3, as I do consider it to be a minor wart.
> And if anyone had cared enough to raise it a couple of years back, it > possibly might have been.
My preference would've been that a backslash followed by A-Z, a-z, or 0-9 is special, but a backslash followed by any other character is just the character, except for backslash followed by a newline, which suppresses the newline.
I would also have preferred a backslash in a raw string to always be a literal.
> There is at least one good reason for preferring an error, namely that it > allows Python to introduce new escape codes without going through a long, > slow process. But the rest of these complaints are terribly unconvincing.
What about:
o Beautiful is better than ugly o Explicit is better than implicit o Simple is better than complex o Readability counts o Special cases aren't special enough to break the rules o Errors should never pass silently
?
And most importantly:
o In the face of ambiguity, refuse the temptation to guess. o There should be one -- and preferably only one -- obvious way to do it.
?
So, what's the one obvious right way to express "foo\zbar"? Is it
"foo\zbar"
or
"foo\\zbar"
And if it's the latter, what possible benefit is there in allowing the former? And if it's the former, why does Python echo the latter?
Douglas Alan wrote: > So, what's the one obvious right way to express "foo\zbar"? Is it > "foo\zbar" > or > "foo\\zbar" > And if it's the latter, what possible benefit is there in allowing the > former? And if it's the former, why does Python echo the latter?
Actually, if we were designing from fresh (with no C behind us), I might advocate for "\s" to be the escape sequence for a backslash. I don't particularly like that it is hard to see if the following string contains a tab: "abc\\\\\\\\\table". The string rules reflect C's rules, and I see little excuse for trying to change them now.
On Aug 10, 10:58 am, Scott David Daniels <Scott.Dani...@Acm.Org> wrote:
> The string rules reflect C's rules, and I see little > excuse for trying to change them now.
No they don't. Or at least not C++'s rules. C++ behaves exactly as I should like.
(Or at least g++ does. Or rather *almost* as I would like, as by default it generates a warning for "foo\zbar", while I think that an error would be somewhat preferable.)
But you're right, it's too late to change this now.
> Steven D'Aprano wrote: > > On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:
> [snip] > >> My point of view is that > >> every language has *some* warts; Python just has a bit fewer than most. > >> It would have been nice, I should think, if this wart had been "fixed" > >> in Python 3, as I do consider it to be a minor wart.
> > And if anyone had cared enough to raise it a couple of years back, it > > possibly might have been.
> My preference would've been that a backslash followed by A-Z, a-z, or > 0-9 is special, but a backslash followed by any other character is just > the character, except for backslash followed by a newline, which > suppresses the newline.
That would be reasonable; it'd match the behavior of regexps.
<ste...@REMOVE.THIS.cybersource.com.au> wrote: > On Mon, 10 Aug 2009 00:37:33 -0700, Carl Banks wrote: > > On Aug 9, 11:10 pm, Steven D'Aprano > > <ste...@REMOVE.THIS.cybersource.com.au> wrote: > >> On Sun, 09 Aug 2009 18:34:14 -0700, Carl Banks wrote: > >> >> Why should a backslash in a string literal be an error?
> >> > Because the behavior of \ in a string is context-dependent, which > >> > means a reader can't know if \ is a literal character or escape > >> > character without knowing the context, and it means an innocuous > >> > change in context can cause a rather significant change in \.
> >> *Any* change in context is significant with escapes.
> >> "this \nhas two lines"
> >> If you change the \n to a \t you get a significant difference. If you > >> change the \n to a \y you get a significant difference. Why is the > >> first one acceptable but the second not?
> > Because when you change \n to \t, you've haven't changed the meaning of > > the \ character;
> I assume you mean the \ character in the literal, not the (non-existent) > \ character in the string.
> > but when you change \n to \y, you have, and you did so > > without even touching the backslash.
> Not at all.
> '\n' maps to the string chr(10). > '\y' maps to the string chr(92) + chr(121).
> In both cases the backslash in the literal have the same meaning: grab > the next token (usually a single character, but not always), look it up > in a mapping somewhere, and insert the result in the string object being > built.
That is a ridiculous rationalization. Nobody sees "\y" in a string and thinks "it's an escape sequence that returns the bytes '\y'".
[snip rest, because an argument in favor inconsistent, context- dependent behavior doesn't need any further refutation than to point out that it is an argument in favor of inconsistent, context-dependent behavior]
From: Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
> On Mon, 10 Aug 2009 00:32:30 -0700, Douglas Alan wrote: > > In C++, if I know that the code I'm looking at compiles, > > then I never need worry that I've misinterpreted what a > > string literal means. > If you don't know what your string literals are, you don't > know what your program does. You can't expect the compiler > to save you from semantic errors. Adding escape codes into > the string literal doesn't change this basic truth.
I grow weary of these semantic debates. The bottom line is that C++'s strategy here catches bugs early on that Python's approach doesn't. It does so at no additional cost.
From a purely practical point of view, why would any language not want to adopt a zero-cost approach to catching bugs, even if they are relatively rare, as early as possible?
(Other than the reason that adopting it *now* is sadly too late.)
Furthermore, Python's strategy here is SPECIFICALLY DESIGNED, according to the reference manual to catch bugs. I.e., from the original posting on this issue:
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)
If this "feature" is designed to catch bugs, why be half-assed about it? Especially since there seems to be little valid use case for allowing programmers to be lazy in their typing here.
> The compiler can't save you from typing 1234 instead of > 11234, or 31.45 instead of 3.145, or "My darling Ho" > instead of "My darling Jo", so why do you expect it to > save you from typing "abc\d" instead of "abc\\d"?
Because in the former cases it can't catch the the bug, and in the latter case, it can.
> Perhaps it can catch *some* errors of that type, but only > at the cost of extra effort required to defeat the > compiler (forcing the programmer to type \\d to prevent > the compiler complaining about \d). I don't think the > benefit is worth the cost. You and your friend do. Who is > to say you're right?
Well, Bjarne Stroustrup, for one.
All of these are value judgments, of course, but I truly doubt that anyone would have been bothered if Python from day one had behaved the way that C++ does. Additionally, I expect that if Python had always behaved the way that C++ does, and then today someone came along and proposed the behavior that Python currently implements, so that the programmer could sometimes get away with typing a bit less, such a person would be chided for not understanding the Zen of Python.
> > You don't have to go running for the manual every time > > you see code with backslashes, where the upshot might be > > that the programmer was merely saving themselves some > > typing. > Why do you care if there are "funny characters"?
Because, of course, "funny characters" often have interesting consequences when output. Furthermore, their consequences aren't always immediately obvious from looking at the source code, unless you are intimately familiar with the function of the special characters in question.
For instance, sometimes in the wrong combination, they wedge your xterm. Etc.
I'm surprised that this needs to be spelled out.
> In C++, if you see an escape you don't recognize, do you > care?
Yes, of course I do. If I need to know what the program does.
> Do you go running for the manual? If the answer is No, > then why do it in Python?
The answer is that I do in both cases.
> No. \z *is* a legal escape sequence, it just happens to map to \z. > If you stop thinking of \z as an illegal escape sequence > that Python refuses to raise an error for, the problem > goes away. It's a legal escape sequence that maps to > backslash + z.
(1) I already used that argument on my friend, and he wasn't buying it. (Personally, I find the argument technically valid, but commonsensically invalid. It's a language-lawyer kind of argument, rather than one that appeals to any notion of real aesthetics.)
(2) That argument disagrees with the Python reference manual, which explicitly states that "unrecognized escape sequences are left in the string unchanged", and that the purpose for doing so is because it "is useful when debugging".
> > "\x" is not a legal escape sequence. Shouldn't it also > > get left as "\\x"?
> No, because it actually is an illegal escape sequence.
What makes it "illegal". As far as I can tell, it's just another "unrecognized escape sequence". JavaScript treats it that way. Are you going to be the one to tell all the JavaScript programmers that their language can't tell a legal escape sequence from an illegal one?
> > Well, I think he's more annoyed that if Python is going > > to be so helpful as to put in the missing "\" for you in > > "foo\zbar", then it should put in the missing "\" for > > you in "\". He considers this to be an inconsistency.
> (1) There is no missing \ in "foo\zbar".
> (2) The problem with "\" isn't a missing backslash, but a > missing end- quote.
Says who? All of this really depends on your point of view. The whole morass goes away completely if one adopts C++'s approach here.
> Python isn't DWIMing here. The rules are simple and straightforward, > there's no mind-reading or guessing required.
It may not be a complex form of DWIMing, but it's still DWIMing a bit. Python is figuring that if I typed "\z", then either I must have really meant to type "\\z", or that I want to see the backslash when I'm debugging because I made a mistake, or that I'm just too lazy to type "\\z".
> Is it "a form of DWIMing" to consider 1.234e1 and 12.34 > synonymous?
That's a very different issue, as (1) there are very significant use cases for both kinds of numerical representations, and (2) there's often only one obvious way way that the number should be entered, depending on the coding situation.
> What about 86 and 0x44? Is that DWIMing?
See previous comment.
> I'm sure both you and your friend are excellent > programmers, but you're tossing around DWIM as a > meaningless term of opprobrium without any apparent > understand of what DWIM actually is.
I don't know if my friend even knows the term DWIM, other than me paraphrasing him, but I certainly understand all about the term. It comes from InterLisp. When DWIM was enabled, your program would run until it hit an error, and for certain kinds of errors, it would wait a few seconds for the user to notice the error message, and if the user didn't tell the program to stop, it would try to figure out what the user most likely meant, and then continue running using the computer-generated "fix".
I.e., more or less like continuing on in the face of what the Python Reference manual refers to as an "unrecognized escape sequence".