SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/10/12 10:09 AM | It's good to get releases out in a timely fashion. However, if common use cases of major new features are hopelessly broken, and easy fixes are available, I'd argue that the delay is better for PR than the alternative. In the case of string interpolation, there's a really unhealthy conjunction of bugs in SI-6631 and SI-6559 (closed as duplicate, but although issue was fixed in 2.10.x, it wasn't ported to 2.10.0, and I can't find a ticket for the fix). Let's suppose we need to do something on Windows. Painful, I know, but some people do. Let's suppose we want to get index.html in a set of directories. scala> s"c:\foo\$arg\index.html" java.lang.StringIndexOutOfBoundsException: String index out of range: 7 scala> s"""c:\foo\$arg\index.html""" java.lang.StringIndexOutOfBoundsException: String index out of range: 7 scala> raw"c:\foo\$arg\index.html" scala.StringContext$InvalidEscapeException: invalid escape character at index 0 in "\index.html" scala> raw"""c:\foo\$arg\index.html""" scala.StringContext$InvalidEscapeException: invalid escape character at index 0 in "\index.html" Crikey, that went well, didn't it? There's no way to make this work without more pain than the old way. Let's try the old way: scala> "c:\foo\"+arg+"\index.html" <console>:1: error: identifier expected but string literal found. "c:\foo\"+arg+"\index.html" Yeah, fair enough, needs to be raw, and the error message helps somewhat. scala> """c:\foo\"""+arg+"""\index.html""" res8: String = c:\foo\bar\index.html scala> """c:\foo\%s\index.html""".format(arg) res9: String = c:\foo\bar\index.html I advise against -FINALing in this state, unless you're trying to send a "don't use new features in a .0 release" message. One fix is already in and just needs to be backported; I will write the other one if no-one else has time to do it. (I doubt that adding a bounds check is the majority of the difficulty, but hey, I'm happy to.) --Rex |
Re: SIP-11 too broken to ship as-is, IMO | Paul Phillips | 11/10/12 10:18 AM |
Maybe it's my skewed perspective, but to me the fact that it doesn't work is of small importance compared to the fact that we would be shipping what seem to be pretty disastrous semantics, which we would then have to be compatible with afterward.
"""I have a long string with a bunch of stuff in it, maybe even a \windows\path\to\foo""" Let's say that string goes on for pages. Oh, I'd like to use string interpolation now. s"""I have a long string with $stuff in it, maybe even a \windows\path\to\foo"""
Now my string has form feeds and tabs in it. I didn't touch that part! And I have to habitually use triple-quoted strings for everything now, because otherwise I am constantly having to convert to triple quotes anyway, because there's no way to include a single quote in a single quoted interpolated string. Unfortunately at least half my strings have single quotes in them.
|
Re: SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/10/12 10:42 AM | On Sat, Nov 10, 2012 at 1:18 PM, Paul Phillips <pa...@improving.org> wrote: Well, that's sort of an argument for leaving it broken, because then if we fix the semantics we don't have to worry as much that people actually used it!
«Use raw"""stuff""" not s"""stuff""" for this» is an okay answer IMO. If raw works. Which it doesn't. I still don't like that single-vs-triple quote semantics are nearly meaningless with string interpolation, but at least your concern has a simple fix. --Rex |
SIP-11 too broken to ship as-is, IMO | Paul Phillips | 11/10/12 11:10 AM | It seems like we could achieve a lot more uniformity if we informed the interpolator whether the literal was single or triple quoted. On Saturday, November 10, 2012, Rex Kerr wrote:
|
Re: SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/10/12 11:17 AM | That was exactly my thought also. Then the question is whether it should be an argument to the interpolator (boolean), or whether it should call different methods depending on which was used. From a safety/consistency standpoint I'd favor the latter (it's the strategy used in Dynamic). But it might be too big of a headache; maybe a boolean (which could just be ignored) is the more pragmatic way to go. --Rex |
Re: SIP-11 too broken to ship as-is, IMO | Erik Osheim | 11/10/12 11:20 AM | On Sat, Nov 10, 2012 at 12:10:26PM -0700, Paul Phillips wrote:I'm probably just confused, but it seems like there's a subtext that we can't just have s"""...""" ignore escapes while doing interpolation. To me it seems like the question of whether \ and $ have special effects seem orthogonal. Maybe I'm wrong? "$$foo\tar$duh" (\t is tab) """$$foo\tar$duh""" (no special behavior) s"$$foo\tar$duh" ($$ is literal $, \t is tab, $duh interpolated) s"""$$foo\tar$duh""" ($$ is literal $, $duh interpolated) Is there something that makes this behavior inconsistent? -- Erik |
Re: SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/10/12 11:36 AM | Right now, s, not the lexer/parser, does the parsing of escape characters. So x"\nHi" actually gets sent as a raw string ""\nHi""" to the x method (including the s method). At that point, s has no idea whether it was s"""\nHi""" or s"\nHi". This is what Paul is suggesting that we change (and I agree). --Rex |
Re: SIP-11 too broken to ship as-is, IMO | Daniel Sobral | 11/10/12 11:53 AM | IMHO, as I wrote in the ticket right now, SI-6631 is not a bug, except in that a proper exception should be thrown instead of StringIndexOutOfBound, but that string should cause an exception.
Both f and s interpolators do process the strings for escape characters, no matter if the string is multiline or not. That is intentional, and I did make sure of it during the comment phase of the SIP. For that matter, I think it's the best behavior as well, even though it will be the cause of bugs.
Though it was not originally contemplated, the raw interpolator balances everything. When converting code, use s if it was a simple string, and raw if it was multiline. That should go in the docs somewhere, in capital letters, but...
I do wish SI-6559 would get into 2.10.0, but there's a simple alternative at this point: not using multiline string interpolators if you have backslashes. Only, I agree with Rex: it must be noted on the release notes (or, preferably, in the doc itself) that raw is broken.
--
Daniel C. Sobral I travel to the future all the time. |
Re: SIP-11 too broken to ship as-is, IMO | Daniel Sobral | 11/10/12 11:58 AM | On Sat, Nov 10, 2012 at 5:36 PM, Rex Kerr <ich...@gmail.com> wrote:Right now, s, not the lexer/parser, does the parsing of escape characters. So x"\nHi" actually gets sent as a raw string ""\nHi""" to the x method (including the s method). At that point, s has no idea whether it was s"""\nHi""" or s"\nHi". This is what Paul is suggesting that we change (and I agree). Letting s and f know whether it was single line or multiline would work. Escaping before passing would *NOT* work, as it would break use for things that are not strings (such as regular expressions).
I think the present solution works best, but I did raise this point during the SIP -- meaning it was there for anyone to see, and with a comment flagging it down.
|
Re: SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/10/12 12:03 PM | On Sat, Nov 10, 2012 at 2:58 PM, Daniel Sobral <dcso...@gmail.com> wrote: I don't think you should escape before passing, but I do think you should be able to know whether it was triple-quoted or not so you can mimic the standard " vs """ behavior if you want to. Whether you pass a boolean or a default parser or call different methods or something else is not really the main point; the indistinguishability is. But that can be fixed later in a backwards-compatible way. All I think should be done now is to fix enough bugs so that there is some hope of using string interpolation on Windows paths, as it looks sloppy to have a main new feature trip on such a common use-case (esp. since it admits a trivial fix). --Rex |
Re: SIP-11 too broken to ship as-is, IMO | Jason Zaugg | 11/10/12 12:08 PM | On Sat, Nov 10, 2012 at 9:03 PM, Rex Kerr <ich...@gmail.com> wrote:
Of course, standard unicode escaping *always* happens earlier, which presents another pitfall: scala> """c:\u1"""
<console>:1: error: error in unicode escape """c:\u1""" ^ scala> """c:\v1234\u0078"""
res0: java.lang.String = c:\v1234x -jason |
Re: SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/10/12 12:14 PM | Agreed that this is problematic, but those only come up on rare use cases. The most common use case for a Windows path is
raw"c:\path\to\$myfile" which works, but pretty much anything else you're liable to want is broken. --Rex |
Re: SIP-11 too broken to ship as-is, IMO | Paul Phillips | 11/10/12 12:23 PM |
Almost always. (The inability to evade unicode escapes was my biggest reservation with the SIP, but I gave it up after being unable to dream up an acceptable remedy.)
% rcscala -Xno-uescape Welcome to Scala version 2.10.0-RC2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_06).
Type in expressions to have them evaluated. Type :help for more information.
scala> """c:\u1"""
res0: String = c:\u1 scala>
|
Re: SIP-11 too broken to ship as-is, IMO | Paul Phillips | 11/10/12 12:27 PM | On Sat, Nov 10, 2012 at 1:03 PM, Rex Kerr <ich...@gmail.com> wrote:I don't think you should escape before passing, but I do think you should be able to know whether it was triple-quoted or not so you can mimic the standard " vs """ behavior if you want to. Whether you pass a boolean or a default parser or call different methods or something else is not really the main point; the indistinguishability is. Can it? If we ship with the current semantics, we will fix the meaning of a \ in a triple-quoted s-interpolated string to be an escape character, and people will write code which relies on it. I don't see how we will be able to undo that in a backward compatible way.
|
Re: SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/10/12 12:31 PM | We'll fix the meaning of the s character, yes. We can implement proper semantics on some other letter. I'd pick q to mean "follow the Scala convention for single vs. triple quotes". Not ideal, but I can't see how we have time to agree on a fix before 2.10.0. --Rex |
Re: SIP-11 too broken to ship as-is, IMO | Daniel Sobral | 11/10/12 12:58 PM | String interpolation on Windows path will be eternally hampered by not being able to represent c:\users. I'm trading turning \u off for single/multiline string flag *and* broken raw (which i can reimplement) any time.
But I'm not disagreeing with anything you said. I just wanted to get that off my chest. :-)
|
Re: SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/10/12 5:22 PM | If only one could make that trade. scala> raw"""c:\users""" raw"""c:\users""" Maybe eventually we'll fix everything. In the meantime, """c:\Users""" --Rex P.S. This suggests the solution to the triple quoting is sealed trait StringEscape extends (String => String) {} case object StandardStringEscape extends StringEscape { def apply(s: String) = ??? } case object RawStringEscape extends StringEscape { def apply(s: String) = ??? } def q(esc: StringEscape, args: Any*) = parts.map(esc).zip(args.map(_.toString))... def literal(esc: StringEscape) = parts.mkString where the compiler by default does not interpret unicode within """ in string interpolations, and if it finds a method taking no args, it just dumps the entire string there. |
Re: SIP-11 too broken to ship as-is, IMO | Paul Phillips | 11/10/12 10:12 PM | On Sat, Nov 10, 2012 at 6:22 PM, Rex Kerr <ich...@gmail.com> wrote: Looks OK from here:
scala> raw"""c:\to\fro\user\bippy"""
res0: String = c:\to\fro\user\bippy |
Re: SIP-11 too broken to ship as-is, IMO | Martin | 11/11/12 7:43 AM | On Sat, Nov 10, 2012 at 7:18 PM, Paul Phillips <pa...@improving.org> wrote: We can argue about this, but not now. The time to argue this point was when the SIP was discussed and before it was accepted. I don't have an opinion on the fix it now vs ship it now debate. Can see the validity of both arguments.
Cheers - Martin
Martin Odersky Prof., EPFL and Chairman, Typesafe PSED, 1015 Lausanne, Switzerland Tel. EPFL: +41 21 693 6863 Tel. Typesafe: +41 21 691 4967 |
Re: SIP-11 too broken to ship as-is, IMO | Paul Phillips | 11/11/12 8:00 AM | People know things when they know them. Would you prefer silence? On Sunday, November 11, 2012, martin odersky wrote: |
Re: SIP-11 too broken to ship as-is, IMO | Martin | 11/11/12 12:32 PM | Of course not. But if the debate is "do we want to have SIP-11 in Scala 2.10 (as the title of the thread implies), it's the wrong debate. If we engaged in that debate now, and did the same for all the other issues that are just as contentious, we will never ship 2.10. |
Re: SIP-11 too broken to ship as-is, IMO | Rex Kerr | 11/11/12 12:43 PM | Since I chose the title, I should point out that I never meant to imply that SIP-11 shouldn't be in Scala 2.10. I think it's one of the best low-entry-barrier additions. Instead, I meant to imply that it was worth fixing because its appeal will likely lead people to rapidly collide with bugs and be grumpy instead of elated (in a platform-dependent manner). Sorry if that wasn't clear. --Rex |
Re: SIP-11 too broken to ship as-is, IMO | Martin | 11/11/12 1:12 PM | It was clear, and I think it's a fair discussion point whether we should fix the bugs you outlined before or after 2.10.0. |
Re: SIP-11 too broken to ship as-is, IMO | Rich Oliver | 11/12/12 2:13 AM | Surely the case is clear the bugs should be fixed, even if meant It taking 2.10 out of RC and doing another milestone. People who already using 2.10 can carry on using it and those that prefer stability can continue using 2.9.2 . No promises have been made for a date for 2.10.0final.
It was clear, and I think it's a fair discussion point whether we should fix the bugs you outlined before or after 2.10.0. |