Underscore in numeric literals (re-visited)

145 views
Skip to first unread message

Aish Fenton

unread,
May 11, 2016, 3:00:45 PM5/11/16
to scala-internals
I see on an older thread (from 2012) that the idea of supporting underscores in numeric literals (i.e. val i = 1_000_000 ) was talked about but the discussions seems to have spluttered out without a conclusion.

I'm considering submitting a PR (or SIP first I guess?) for it.

Here's why I think the change is worth it:

* For data science work in Scala, where we are frequently typing in numbers, and working in a more REPL type environment, it significantly improves readability and ease. It's one of those little things that makes you feel like this language is meant for this type of work. And I'd argue that in many ways Scala is the best language for data science already, but getting these small things right really helps.

* Most languages already support it, so users now expect it: Java9 and Ruby do, and Python is working on it (https://www.python.org/dev/peps/pep-0515/).

* String interpolation isn't the answer (i.e. I see the Spire library supports this: i"1_000_000"). But that feels clunky and requires a separate dependency / import just to type in a number. 

* It's orthogonal to other uses of underscore. I've seen the argument that underscore is already quite overloaded with meanings in Scala. But I think this use of underscore is so separate and so clearly defined (and expected from other languages) that it isn't really an issue.

* It's a small PR to add this functionality. I see the few lines in Lexer (or Scanner) that would need to change, and it seems straight forward to do so. 

What do people think? Is it worth me starting down the path of doing a SIP? Or is it unlikely to be accepted?




Aish Fenton

unread,
May 11, 2016, 3:45:43 PM5/11/16
to scala-internals

Simon Ochsenreither

unread,
May 11, 2016, 8:27:08 PM5/11/16
to scala-internals
I'm not really for or against numeric separators, I don't care much about it; but using "_" for it gets a "no way in hell" from me.

It's visually jarring, and introduces a completely different meaning from all other usages of "_".

If there was a separator, I think 1'000'000 is a lot less distracting, and a lot more obvious about what it does.

Rex Kerr

unread,
May 11, 2016, 8:33:09 PM5/11/16
to scala-i...@googlegroups.com
It does seem to be more and more common to use _ as a separator, and I also find it really jarring in general but especially in Scala where you're trained to pick out _'s because they mean that something magic is happening.  As a separator, you want it to not attract attention at all.

I also would prefer '.  A space would work too, but might not be very kind to the lexer, depending on how exactly lexing/parsing happens.  (I haven't checked.)

  --Rex


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Ochsenreither

unread,
May 11, 2016, 8:35:07 PM5/11/16
to scala-internals
Just to add another thing:

I'm glad that we don't live in a world anymore where

123.toString

and

123 .toString

are different things.

These issues will be reintroduced with _:

scala> implicit class Foo(val int: Int) extends AnyVal { def _ef = "bla" }
defined class Foo

scala> 0xab_cd_ef
res1: Int = 11259375

scala> 0xab_cd _ef
res2: String = bla

Erik Bruchez

unread,
May 12, 2016, 12:08:13 PM5/12/16
to scala-i...@googlegroups.com
Using the apostrophe might be obvious in Europe but it's not used at
all in the US (where the comma is used instead). So I don't think that
will be acceptable. The underscore has the benefit of being neutral
wrt existing thousands separators symbols in practical use (outside of
programming languages).

-Erik

Rex Kerr

unread,
May 12, 2016, 1:27:23 PM5/12/16
to scala-i...@googlegroups.com
I don't understand the logic.  "Apostrophe isn't used in the U.S., so it's unacceptable.  So let's use underscore, which isn't used in the U.S.."

Apostrophe isn't a common digit separator either; it is used in Switzerland but practically nowhere else.  It's certainly not obvious except inasmuch as Europeans are more used to seeing various different notation.  And if the problem in the U.S. is it not being in general civilian use, it's a problem either way.  (And programmers are used to weird syntax, like 0x and << and || so on.)

In any case, I think spaces should work syntactically, and that has the benefit of being some sort of international standard (but a disadvantage of not working with hexidecimal).

  --Rex

Matthew Pocock

unread,
May 12, 2016, 1:52:04 PM5/12/16
to scala-i...@googlegroups.com

I honestly don't see the issue with using interpolation.

Naftoli Gugenheim

unread,
May 12, 2016, 1:53:17 PM5/12/16
to scala-i...@googlegroups.com

I wonder if IDEs could just format numbers synthetically

Hanns Holger Rutz

unread,
May 12, 2016, 2:20:46 PM5/12/16
to scala-i...@googlegroups.com
+1

and if you beg the IDE teams to highlight it properly, there is also no
visual difference.

Scala is definitely going that way with XML literals going away.

best, .hh.


On 12/05/16 19:52, Matthew Pocock wrote:
> I honestly don't see the issue with using interpolation.
>
> On 12 May 2016 18:27, "Rex Kerr" <ich...@gmail.com
> <mailto:simon.och...@gmail.com>> wrote:
> > Just to add another thing:
> >
> > I'm glad that we don't live in a world anymore where
> >
> > 123.toString
> >
> > and
> >
> > 123 .toString
> >
> > are different things.
> >
> > These issues will be reintroduced with _:
> >
> > scala> implicit class Foo(val int: Int) extends AnyVal { def
> _ef = "bla" }
> > defined class Foo
> >
> > scala> 0xab_cd_ef
> > res1: Int = 11259375
> >
> > scala> 0xab_cd _ef
> > res2: String = bla
> >
> > --
> > You received this message because you are subscribed to the
> Google Groups
> > "scala-internals" group.
> > To unsubscribe from this group and stop receiving emails from
> it, send an
> > email to scala-interna...@googlegroups.com
> <mailto:scala-internals%2Bunsu...@googlegroups.com>.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the
> Google Groups "scala-internals" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to
> scala-interna...@googlegroups.com
> <mailto:scala-internals%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "scala-internals" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to scala-interna...@googlegroups.com
> <mailto:scala-interna...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "scala-internals" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to scala-interna...@googlegroups.com
> <mailto:scala-interna...@googlegroups.com>.

Erik Bruchez

unread,
May 12, 2016, 2:34:21 PM5/12/16
to scala-i...@googlegroups.com
The point was that "_" is used by nobody in real life, not that it
isn't used in the US. Now if the apostrophe is used by Switzerland
only then I guess you could make a case that it's neutral too ;)

-Erik

Rex Kerr

unread,
May 12, 2016, 2:37:27 PM5/12/16
to scala-i...@googlegroups.com
If 1% if what you're doing is typing numbers, it doesn't matter.  If it's 10% or 50%, then it's a big deal.  n"1,000" takes almost twice as long for me to type as 1 000.  1_000 and 1'000 are comparably fast.  (In my hands, _ is a tad slower because of the need to press shift.)

Making the IDE do it would not be a bad idea except that a lot of numeric exploration is done interactively, and right now the intersection of IDE and REPL is not very good.

  --Rex

Naftoli Gugenheim

unread,
May 12, 2016, 7:32:34 PM5/12/16
to scala-i...@googlegroups.com
On Thu, May 12, 2016 at 2:37 PM Rex Kerr <ich...@gmail.com> wrote:
If 1% if what you're doing is typing numbers, it doesn't matter.  If it's 10% or 50%, then it's a big deal.  n"1,000" takes almost twice as long for me to type as 1 000.  1_000 and 1'000 are comparably fast.  (In my hands, _ is a tad slower because of the need to press shift.)

Making the IDE do it would not be a bad idea except that a lot of numeric exploration is done interactively, and right now the intersection of IDE and REPL is not very good.

In five minutes, Haoyi Li will probably announce that Ammonite can now visualize the digit groups. :)

Aish Fenton

unread,
May 12, 2016, 8:37:45 PM5/12/16
to scala-internals
Rex sums it up nicely, but here's a bit more color from my end.

A lot of the code we write is done outside the IDE and is one-off code (in tools such as: https://zeppelin.incubator.apache.org/). Importing extra jar dependancies, having to do an import, and then having to surround every number in quotes (making it look like a string when it isn't) feels pretty awful. Especially when most modern languages now have this affordance included. 

Simon, I think a comma separator is fine too. Although, for better or worse, other languages have created the expectation that underscore is used for this purpose (not that that means we have to follow them)

Hanns Holger Rutz

unread,
May 13, 2016, 12:47:07 AM5/13/16
to scala-i...@googlegroups.com
On 13/05/16 02:37, Aish Fenton wrote:
> Simon, I think a comma separator is fine too.

No it's not. Because they are used to define the digit position in
German. German 1.000,00 is equivalent to English 1,000.00

The apostrophe makes a lot more sense, even if it's new to US citizens
(as to the rest of the world except Switzerland apparently).

best, .h..h.

Hanns Holger Rutz

unread,
May 13, 2016, 12:50:23 AM5/13/16
to scala-i...@googlegroups.com
besides comma, space and underscore are bad choices in Scala because
they have already other syntactic meaning.

Aish Fenton

unread,
May 13, 2016, 12:54:37 AM5/13/16
to scala-internals
Oops, meant to write apostrophe.

Matthew Pocock

unread,
May 16, 2016, 5:43:13 PM5/16/16
to scala-i...@googlegroups.com
Hanns, to me you are making a good case for this being done through interpolation. The conventions vary from local to local and from group to group.

EN"1,000.00" === DE"1.000,00" === JAVA"1_000.00"

or if you prefer different sugar, float the syntax in with an implicit for the same interpolator:

import numericLiterals.EN;
d"1,000.00"

import numericLiterals.DE
d"1.000,00"

import numericLiterals.JAVA
d"1_000.00"

Having these things naked is always going to cause syntactic ambiguity, as the characters that the various locals use for numeric punctuation are all ones that already have either user-expectations or syntax implications within Scala.

Matthew

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dr Matthew Pocock
Turing ate my hamster LTD

Integrative Bioinformatics Group, School of Computing Science, Newcastle University

skype: matthew.pocock
tel: (0191) 2566550
Reply all
Reply to author
Forward
0 new messages