SIP-11 too broken to ship as-is, IMO

347 views
Skip to first unread message

Rex Kerr

unread,
Nov 10, 2012, 1:09:33 PM11/10/12
to scala...@googlegroups.com
It's good to get releases out in a timely fashion.  However, if common use cases of major new features are hopelessly broken, and easy fixes are available, I'd argue that the delay is better for PR than the alternative.

In the case of string interpolation, there's a really unhealthy conjunction of bugs in SI-6631 and SI-6559 (closed as duplicate, but although issue was fixed in 2.10.x, it wasn't ported to 2.10.0, and I can't find a ticket for the fix).

Let's suppose we need to do something on Windows.  Painful, I know, but some people do.  Let's suppose we want to get index.html in a set of directories.

scala> s"c:\foo\$arg\index.html"
java.lang.StringIndexOutOfBoundsException: String index out of range: 7

scala> s"""c:\foo\$arg\index.html"""
java.lang.StringIndexOutOfBoundsException: String index out of range: 7

scala> raw"c:\foo\$arg\index.html"
scala.StringContext$InvalidEscapeException: invalid escape character at index 0 in "\index.html"

scala> raw"""c:\foo\$arg\index.html"""
scala.StringContext$InvalidEscapeException: invalid escape character at index 0 in "\index.html"

Crikey, that went well, didn't it?  There's no way to make this work without more pain than the old way.

Let's try the old way:

scala> "c:\foo\"+arg+"\index.html"
<console>:1: error: identifier expected but string literal found.
       "c:\foo\"+arg+"\index.html"

Yeah, fair enough, needs to be raw, and the error message helps somewhat.

scala> """c:\foo\"""+arg+"""\index.html"""
res8: String = c:\foo\bar\index.html

scala> """c:\foo\%s\index.html""".format(arg)
res9: String = c:\foo\bar\index.html

I advise against -FINALing in this state, unless you're trying to send a "don't use new features in a .0 release" message.

One fix is already in and just needs to be backported; I will write the other one if no-one else has time to do it.  (I doubt that adding a bounds check is the majority of the difficulty, but hey, I'm happy to.)

  --Rex

Paul Phillips

unread,
Nov 10, 2012, 1:18:13 PM11/10/12
to scala...@googlegroups.com


On Sat, Nov 10, 2012 at 11:09 AM, Rex Kerr <ich...@gmail.com> wrote:
It's good to get releases out in a timely fashion.  However, if common use cases of major new features are hopelessly broken, and easy fixes are available, I'd argue that the delay is better for PR than the alternative.

Maybe it's my skewed perspective, but to me the fact that it doesn't work is of small importance compared to the fact that we would be shipping what seem to be pretty disastrous semantics, which we would then have to be compatible with afterward.

   """I have a long string with a bunch of stuff in it, maybe even a \windows\path\to\foo"""

Let's say that string goes on for pages. Oh, I'd like to use string interpolation now.

  s"""I have a long string with $stuff in it, maybe even a \windows\path\to\foo"""

Now my string has form feeds and tabs in it. I didn't touch that part! And I have to habitually use triple-quoted strings for everything now, because otherwise I am constantly having to convert to triple quotes anyway, because there's no way to include a single quote in a single quoted interpolated string. Unfortunately at least half my strings have single quotes in them.

Rex Kerr

unread,
Nov 10, 2012, 1:42:55 PM11/10/12
to scala...@googlegroups.com
On Sat, Nov 10, 2012 at 1:18 PM, Paul Phillips <pa...@improving.org> wrote:


On Sat, Nov 10, 2012 at 11:09 AM, Rex Kerr <ich...@gmail.com> wrote:
It's good to get releases out in a timely fashion.  However, if common use cases of major new features are hopelessly broken, and easy fixes are available, I'd argue that the delay is better for PR than the alternative.

Maybe it's my skewed perspective, but to me the fact that it doesn't work is of small importance compared to the fact that we would be shipping what seem to be pretty disastrous semantics, which we would then have to be compatible with afterward.

Well, that's sort of an argument for leaving it broken, because then if we fix the semantics we don't have to worry as much that people actually used it!
 

   """I have a long string with a bunch of stuff in it, maybe even a \windows\path\to\foo"""

Let's say that string goes on for pages. Oh, I'd like to use string interpolation now.

  s"""I have a long string with $stuff in it, maybe even a \windows\path\to\foo"""

«Use raw"""stuff""" not s"""stuff""" for this» is an okay answer IMO.  If raw works.  Which it doesn't.

I still don't like that single-vs-triple quote semantics are nearly meaningless with string interpolation, but at least your concern has a simple fix.

  --Rex

Paul Phillips

unread,
Nov 10, 2012, 2:10:26 PM11/10/12
to scala...@googlegroups.com
It seems like we could achieve a lot more uniformity if we informed the interpolator whether the literal was single or triple quoted.

Rex Kerr

unread,
Nov 10, 2012, 2:17:13 PM11/10/12
to scala...@googlegroups.com
That was exactly my thought also.  Then the question is whether it should be an argument to the interpolator (boolean), or whether it should call different methods depending on which was used.  From a safety/consistency standpoint I'd favor the latter (it's the strategy used in Dynamic).  But it might be too big of a headache; maybe a boolean (which could just be ignored) is the more pragmatic way to go.

  --Rex

Erik Osheim

unread,
Nov 10, 2012, 2:20:38 PM11/10/12
to scala...@googlegroups.com
On Sat, Nov 10, 2012 at 12:10:26PM -0700, Paul Phillips wrote:
> It seems like we could achieve a lot more uniformity if we informed the
> interpolator whether the literal was single or triple quoted.

I'm probably just confused, but it seems like there's a subtext that we
can't just have s"""...""" ignore escapes while doing interpolation. To
me it seems like the question of whether \ and $ have special effects
seem orthogonal. Maybe I'm wrong?

"$$foo\tar$duh" (\t is tab)
"""$$foo\tar$duh""" (no special behavior)
s"$$foo\tar$duh" ($$ is literal $, \t is tab, $duh interpolated)
s"""$$foo\tar$duh""" ($$ is literal $, $duh interpolated)

Is there something that makes this behavior inconsistent?

-- Erik

Rex Kerr

unread,
Nov 10, 2012, 2:36:56 PM11/10/12
to scala...@googlegroups.com
Right now, s, not the lexer/parser, does the parsing of escape characters.  So x"\nHi" actually gets sent as a raw string ""\nHi""" to the x method (including the s method).  At that point, s has no idea whether it was s"""\nHi""" or s"\nHi".  This is what Paul is suggesting that we change (and I agree).

  --Rex

Daniel Sobral

unread,
Nov 10, 2012, 2:53:35 PM11/10/12
to scala...@googlegroups.com
IMHO, as I wrote in the ticket right now, SI-6631 is not a bug, except in that a proper exception should be thrown instead of StringIndexOutOfBound, but that string should cause an exception.

Both f and s interpolators do process the strings for escape characters, no matter if the string is multiline or not. That is intentional, and I did make sure of it during the comment phase of the SIP. For that matter, I think it's the best behavior as well, even though it will be the cause of bugs.

Though it was not originally contemplated, the raw interpolator balances everything. When converting code, use s if it was a simple string, and raw if it was multiline. That should go in the docs somewhere, in capital letters, but...

I do wish SI-6559 would get into 2.10.0, but there's a simple alternative at this point: not using multiline string interpolators if you have backslashes. Only, I agree with Rex: it must be noted on the release notes (or, preferably, in the doc itself) that raw is broken.
--
Daniel C. Sobral

I travel to the future all the time.

Daniel Sobral

unread,
Nov 10, 2012, 2:58:00 PM11/10/12
to scala...@googlegroups.com
On Sat, Nov 10, 2012 at 5:36 PM, Rex Kerr <ich...@gmail.com> wrote:
Right now, s, not the lexer/parser, does the parsing of escape characters.  So x"\nHi" actually gets sent as a raw string ""\nHi""" to the x method (including the s method).  At that point, s has no idea whether it was s"""\nHi""" or s"\nHi".  This is what Paul is suggesting that we change (and I agree).

Letting s and f know whether it was single line or multiline would work. Escaping before passing would *NOT* work, as it would break use for things that are not strings (such as regular expressions).

I think the present solution works best, but I did raise this point during the SIP -- meaning it was there for anyone to see, and with a comment flagging it down.
 


  --Rex



On Sat, Nov 10, 2012 at 2:20 PM, Erik Osheim <er...@plastic-idolatry.com> wrote:
On Sat, Nov 10, 2012 at 12:10:26PM -0700, Paul Phillips wrote:
> It seems like we could achieve a lot more uniformity if we informed the
> interpolator whether the literal was single or triple quoted.

I'm probably just confused, but it seems like there's a subtext that we
can't just have s"""...""" ignore escapes while doing interpolation. To
me it seems like the question of whether \ and $ have special effects
seem orthogonal. Maybe I'm wrong?

    "$$foo\tar$duh" (\t is tab)
    """$$foo\tar$duh""" (no special behavior)
    s"$$foo\tar$duh" ($$ is literal $, \t is tab, $duh interpolated)
    s"""$$foo\tar$duh""" ($$ is literal $, $duh interpolated)

Is there something that makes this behavior inconsistent?

-- Erik

Rex Kerr

unread,
Nov 10, 2012, 3:03:56 PM11/10/12
to scala...@googlegroups.com
On Sat, Nov 10, 2012 at 2:58 PM, Daniel Sobral <dcso...@gmail.com> wrote:

On Sat, Nov 10, 2012 at 5:36 PM, Rex Kerr <ich...@gmail.com> wrote:
Right now, s, not the lexer/parser, does the parsing of escape characters.  So x"\nHi" actually gets sent as a raw string ""\nHi""" to the x method (including the s method).  At that point, s has no idea whether it was s"""\nHi""" or s"\nHi".  This is what Paul is suggesting that we change (and I agree).

Letting s and f know whether it was single line or multiline would work. Escaping before passing would *NOT* work, as it would break use for things that are not strings (such as regular expressions).

I think the present solution works best, but I did raise this point during the SIP -- meaning it was there for anyone to see, and with a comment flagging it down.

I don't think you should escape before passing, but I do think you should be able to know whether it was triple-quoted or not so you can mimic the standard " vs """ behavior if you want to.  Whether you pass a boolean or a default parser or call different methods or something else is not really the main point; the indistinguishability is.

But that can be fixed later in a backwards-compatible way.  All I think should be done now is to fix enough bugs so that there is some hope of using string interpolation on Windows paths, as it looks sloppy to have a main new feature trip on such a common use-case (esp. since it admits a trivial fix).

  --Rex

Jason Zaugg

unread,
Nov 10, 2012, 3:08:51 PM11/10/12
to scala...@googlegroups.com
On Sat, Nov 10, 2012 at 9:03 PM, Rex Kerr <ich...@gmail.com> wrote:
But that can be fixed later in a backwards-compatible way.  All I think should be done now is to fix enough bugs so that there is some hope of using string interpolation on Windows paths, as it looks sloppy to have a main new feature trip on such a common use-case (esp. since it admits a trivial fix)

Of course, standard unicode escaping *always* happens earlier, which presents another pitfall:

scala> """c:\u1"""
<console>:1: error: error in unicode escape
       """c:\u1"""
               ^

scala> """c:\v1234\u0078"""
res0: java.lang.String = c:\v1234x 

-jason

Rex Kerr

unread,
Nov 10, 2012, 3:14:59 PM11/10/12
to scala...@googlegroups.com
Agreed that this is problematic, but those only come up on rare use cases.  The most common use case for a Windows path is
  raw"c:\path\to\$myfile"
which works, but pretty much anything else you're liable to want is broken.

  --Rex


Paul Phillips

unread,
Nov 10, 2012, 3:23:06 PM11/10/12
to scala...@googlegroups.com


On Sat, Nov 10, 2012 at 1:08 PM, Jason Zaugg <jza...@gmail.com> wrote:
Of course, standard unicode escaping *always* happens earlier, which presents another pitfall:

Almost always. (The inability to evade unicode escapes was my biggest reservation with the SIP, but I gave it up after being unable to dream up an acceptable remedy.)

% rcscala -Xno-uescape
Welcome to Scala version 2.10.0-RC2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_06).
Type in expressions to have them evaluated.
Type :help for more information.

scala> """c:\u1"""
res0: String = c:\u1

scala> 

Paul Phillips

unread,
Nov 10, 2012, 3:27:20 PM11/10/12
to scala...@googlegroups.com
On Sat, Nov 10, 2012 at 1:03 PM, Rex Kerr <ich...@gmail.com> wrote:
I don't think you should escape before passing, but I do think you should be able to know whether it was triple-quoted or not so you can mimic the standard " vs """ behavior if you want to.  Whether you pass a boolean or a default parser or call different methods or something else is not really the main point; the indistinguishability is.

But that can be fixed later in a backwards-compatible way.

Can it? If we ship with the current semantics, we will fix the meaning of a \ in a triple-quoted s-interpolated string to be an escape character, and people will write code which relies on it. I don't see how we will be able to undo that in a backward compatible way.

Rex Kerr

unread,
Nov 10, 2012, 3:31:54 PM11/10/12
to scala...@googlegroups.com

We'll fix the meaning of the s character, yes.  We can implement proper semantics on some other letter.  I'd pick q to mean "follow the Scala convention for single vs. triple quotes".

Not ideal, but I can't see how we have time to agree on a fix before 2.10.0.

  --Rex

Daniel Sobral

unread,
Nov 10, 2012, 3:58:43 PM11/10/12
to scala...@googlegroups.com
String interpolation on Windows path will be eternally hampered by not being able to represent c:\users.

I'm trading turning \u off for single/multiline string flag *and* broken raw (which i can reimplement) any time.

But I'm not disagreeing with anything you said. I just wanted to get that off my chest. :-) 
 
  --Rex

Rex Kerr

unread,
Nov 10, 2012, 8:22:43 PM11/10/12
to scala...@googlegroups.com
If only one could make that trade.

scala> raw"""c:\users"""

<console>:1: error: error in unicode escape
       raw"""c:\users"""

Maybe eventually we'll fix everything.

In the meantime, """c:\Users"""

  --Rex

P.S. This suggests the solution to the triple quoting is
  sealed trait StringEscape extends (String => String) {}
  case object StandardStringEscape extends StringEscape { def apply(s: String) = ??? }
  case object RawStringEscape extends StringEscape { def apply(s: String) = ??? }

  def q(esc: StringEscape, args: Any*) = parts.map(esc).zip(args.map(_.toString))...
  def literal(esc: StringEscape) = parts.mkString
where the compiler by default does not interpret unicode within """ in string interpolations, and if it finds a method taking no args, it just dumps the entire string there.

Paul Phillips

unread,
Nov 11, 2012, 1:12:48 AM11/11/12
to scala...@googlegroups.com
On Sat, Nov 10, 2012 at 6:22 PM, Rex Kerr <ich...@gmail.com> wrote:
If only one could make that trade.

scala> raw"""c:\users"""

Looks OK from here:

% rcscala -Xno-uescape
Welcome to Scala version 2.10.0-RC2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_06).
Type in expressions to have them evaluated.
Type :help for more information.

scala> raw"""c:\to\fro\user\bippy"""
res0: String = c:\to\fro\user\bippy
 

martin odersky

unread,
Nov 11, 2012, 10:43:25 AM11/11/12
to scala...@googlegroups.com
On Sat, Nov 10, 2012 at 7:18 PM, Paul Phillips <pa...@improving.org> wrote:


On Sat, Nov 10, 2012 at 11:09 AM, Rex Kerr <ich...@gmail.com> wrote:
It's good to get releases out in a timely fashion.  However, if common use cases of major new features are hopelessly broken, and easy fixes are available, I'd argue that the delay is better for PR than the alternative.

Maybe it's my skewed perspective, but to me the fact that it doesn't work is of small importance compared to the fact that we would be shipping what seem to be pretty disastrous semantics, which we would then have to be compatible with afterward.

   """I have a long string with a bunch of stuff in it, maybe even a \windows\path\to\foo"""

Let's say that string goes on for pages. Oh, I'd like to use string interpolation now.

  s"""I have a long string with $stuff in it, maybe even a \windows\path\to\foo"""

We can argue about this, but not now. The time to argue this point was when the SIP was discussed and before it was accepted.

I don't have an opinion on the fix it now vs ship it now debate. Can see the validity of both arguments.

Cheers

 - Martin

Now my string has form feeds and tabs in it. I didn't touch that part! And I have to habitually use triple-quoted strings for everything now, because otherwise I am constantly having to convert to triple quotes anyway, because there's no way to include a single quote in a single quoted interpolated string. Unfortunately at least half my strings have single quotes in them.




--
Martin Odersky
Prof., EPFL and Chairman, Typesafe
PSED, 1015 Lausanne, Switzerland
Tel. EPFL: +41 21 693 6863
Tel. Typesafe: +41 21 691 4967

Paul Phillips

unread,
Nov 11, 2012, 11:00:57 AM11/11/12
to scala...@googlegroups.com
People know things when they know them. Would you prefer silence?

martin odersky

unread,
Nov 11, 2012, 3:32:53 PM11/11/12
to scala...@googlegroups.com
Of course not. But if the debate is "do we want to have SIP-11 in Scala 2.10 (as the title of the thread implies), it's the wrong debate. If we engaged in that debate now, and did the same for all the other issues that are just as contentious, we will never ship 2.10. 

Rex Kerr

unread,
Nov 11, 2012, 3:43:53 PM11/11/12
to scala...@googlegroups.com
Since I chose the title, I should point out that I never meant to imply that SIP-11 shouldn't be in Scala 2.10.  I think it's one of the best low-entry-barrier additions.  Instead, I meant to imply that it was worth fixing because its appeal will likely lead people to rapidly collide with bugs and be grumpy instead of elated (in a platform-dependent manner).

Sorry if that wasn't clear.

  --Rex

martin odersky

unread,
Nov 11, 2012, 4:12:31 PM11/11/12
to scala...@googlegroups.com
It was clear, and I think it's a fair discussion point whether we should fix the bugs you outlined before or after 2.10.0. 

Rich Oliver

unread,
Nov 12, 2012, 5:13:41 AM11/12/12
to scala...@googlegroups.com
Surely the case is clear the bugs should be fixed, even if meant It taking 2.10 out of RC and doing another milestone. People who already using 2.10 can carry on using it and those that prefer stability can continue using 2.9.2 . No promises have been made for a date for 2.10.0final.


On Sunday, November 11, 2012 9:12:54 PM UTC, Martin wrote:
It was clear, and I think it's a fair discussion point whether we should fix the bugs you outlined before or after 2.10.0. 


On Sun, Nov 11, 2012 at 9:43 PM, Rex Kerr <ich...@gmail.com> wrote:
Since I chose the title, I should point out that I never meant to imply that SIP-11 shouldn't be in Scala 2.10.  I think it's one of the best low-entry-barrier additions.  Instead, I meant to imply that it was worth fixing because its appeal will likely lead people to rapidly collide with bugs and be grumpy instead of elated (in a platform-dependent manner).



Reply all
Reply to author
Forward
0 new messages