Use of symbol literals in Scala

277 views
Skip to first unread message

Andrew Goodnough

unread,
Apr 3, 2011, 11:03:06 AM4/3/11
to scala...@googlegroups.com
In O'Reilly's Programming Scala book, the authors mention that symbol literals are used more in Ruby than in Scala.

http://programming-scala.labs.oreilly.com/ch02.html#SymbolLiterals

Having spent some time programming in Ruby, and also having used a Java framework for years that made heavy use of an object that wrapped String for use as keys for maps, I like that Scala supports the symbol literal.  I like it not so much for the implications on memory (as it does in Ruby) but because it says to the programmer that this particular string is for internal use by the program and is not intended for consumption by the end user.  It also conveys that this key is not some random string but will be used to lookup this object later.  But...I haven't yet had a use case for using a symbol so it makes me wonder if it's use discouraged or just not used due to some other feature of Scala naturally selects against it?  Just curious if anyone out there is using symbols and/or has opinions on them.

Andy

Lex

unread,
Apr 3, 2011, 1:23:31 PM4/3/11
to Andrew Goodnough, scala...@googlegroups.com
I am trying out symbols, thought as you said, I could have just as
easily done strings. I found that string were significantly faster
when used as map keys. Perhaps this is due to construction of symbols
and could be improved in the future.

Nils Kilden-Pedersen

unread,
Apr 3, 2011, 7:07:43 PM4/3/11
to Lex, Andrew Goodnough, scala...@googlegroups.com
On Sun, Apr 3, 2011 at 12:23 PM, Lex <lex...@gmail.com> wrote:
I am trying out symbols, thought as you said, I could have just as
easily done strings. I found that string were significantly faster
when used as map keys. Perhaps this is due to construction of symbols
and could be improved in the future.

Symbol construction is horribly slow, involving global synchronization. But symbol literals should be statically constructed, so should carry no such continuous overhead.

Lex

unread,
Apr 3, 2011, 7:14:11 PM4/3/11
to Nils Kilden-Pedersen, Andrew Goodnough, scala...@googlegroups.com
My code looks like this:
map("id") := map("id") + 1
vs
map('id) := map('id) + 1

The string version is about twice faster than the symbol version. Keep
in mind, this includes addition, set operator, and the map read/write
overhead. So i highly doubt symbol literals are statically
constructed, assuming that symbol's equals() and hashCode() are faster
than string's.

Paul Phillips

unread,
Apr 3, 2011, 7:29:26 PM4/3/11
to Lex, Nils Kilden-Pedersen, Andrew Goodnough, scala...@googlegroups.com
On 4/3/11 4:14 PM, Lex wrote:
>So i highly doubt symbol literals are statically
> constructed, assuming that symbol's equals() and hashCode() are faster
> than string's.

You are allowed to check you know. No penalty for phoning the repl.

scala> class A { val x = 'foo ; val y = 'foo ; val z = 'foo }
defined class A

scala> :javap -verbose A
// blah blah constant pool

{
private final scala.Symbol x;

private final scala.Symbol y;

private final scala.Symbol z;

private static final scala.Symbol symbol$1;

public static {};
Code:
Stack=2, Locals=0, Args_size=0
0: getstatic #11; //Field scala/Symbol$.MODULE$:Lscala/Symbol$;
3: ldc #14; //String foo
5: invokevirtual #18; //Method
scala/Symbol$.apply:(Ljava/lang/Object;)Ljava/lang/Object;
8: checkcast #20; //class scala/Symbol
11: putstatic #26; //Field symbol$1:Lscala/Symbol;
14: return

Looks pretttty static.

Lex

unread,
Apr 3, 2011, 8:00:24 PM4/3/11
to Paul Phillips, Nils Kilden-Pedersen, Andrew Goodnough, scala...@googlegroups.com
You are right, looks like symbols are static. However the symbol map
is still slower. If symbols are compared using ref equality, then it
must be the hash-code...

Seth Tisue

unread,
Apr 4, 2011, 10:21:48 AM4/4/11
to scala...@googlegroups.com
>>>>> "Lex" == Lex <lex...@gmail.com> writes:

Lex> The string version is about twice faster than the symbol
Lex> version.

If you could track down the root cause, it sounds like it'd make a great
Trac ticket...

--
Seth Tisue | Northwestern University | http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/

martin odersky

unread,
Apr 4, 2011, 11:07:48 AM4/4/11
to Seth Tisue, scala...@googlegroups.com
On Mon, Apr 4, 2011 at 4:21 PM, Seth Tisue <se...@tisue.net> wrote:
>>>>> "Lex" == Lex  <lex...@gmail.com> writes:

 Lex> The string version is about twice faster than the symbol
 Lex> version.

If you could track down the root cause, it sounds like it'd make a great
Trac ticket...


The ``problem'' here is that strings are interned by the VM, but symbols are not.
So if you write "id" twice, it's the same constant, and hash map lookup is very fast. If you write 'id twice, lookup is still as fast as for strinfs, but you get
new Symbol("id") twice and that needs to be hash-consed twice, which seems to account for the overhead that you are seeing.

Cheers

 -- Martin

√iktor Ҡlang

unread,
Apr 4, 2011, 11:25:16 AM4/4/11
to martin odersky, Seth Tisue, scala...@googlegroups.com

Could use a customized version of ConcurrentHashMap where putIfAbsent would have the following signature: def putIfAbsent(symbolId: String): Symbol

So the hashcode of the Symbol would be the one of the String, so:

'Foo == Symbols.putIfAbsent("Foo")

Yielding only allocations for the case of absence.
 

Cheers

 -- Martin



--
Viktor Klang,
Relentless is more
Work:   Scalable Solutions
Code:   github.com/viktorklang
Follow: twitter.com/viktorklang
Read:   klangism.tumblr.com

Paul Phillips

unread,
Apr 4, 2011, 3:56:12 PM4/4/11
to √iktor Ҡlang, martin odersky, Seth Tisue, scala...@googlegroups.com
On 4/4/11 8:25 AM, √iktor Ҡlang wrote:
> Could use a customized version of ConcurrentHashMap where putIfAbsent
> would have the following signature: def putIfAbsent(symbolId: String):
> Symbol
>
> So the hashcode of the Symbol would be the one of the String, so:
>
> 'Foo == Symbols.putIfAbsent("Foo")
>
> Yielding only allocations for the case of absence.

I believe this is a great idea. Unfortunately Symbol is defined with an
awful lot of specificity in the specification, so much so that it's
pretty much impossible to change it without a spec change.

> A symbol literal ’x is a shorthand for the expression scala.Symbol("x"). Symbol
> is a case class (§5.3.2), which is defined as follows.
> package scala final case class Symbol private (name: String) {
> override def toString: String = "’" + name
> }

If we could blur that a little bit so that

// this part is fine
'x is shorthand for the expression scala.Symbol("x")
// remove detail, along the lines of
... where Symbol is a class which is guaranteed to be reference equal
to any other Symbol created from an equal String

...then I could make the companion object be the concurrent hashmap, and
apply wrap around putIfAbsent.

Ismael Juma

unread,
Apr 4, 2011, 4:10:59 PM4/4/11
to martin odersky, Seth Tisue, scala...@googlegroups.com
On Mon, Apr 4, 2011 at 4:07 PM, martin odersky <martin....@epfl.ch> wrote:
> So if you write "id" twice, it's the same constant, and hash map lookup is
> very fast. If you write 'id twice, lookup is still as fast as for strinfs,
> but you get
> new Symbol("id") twice and that needs to be hash-consed twice, which seems
> to account for the overhead that you are seeing.

As far as I can see, it's a bit more complicated than that.
Symbol.apply is called and that caches values in a WeakHashMap using a
ReentrantReadWriteLock to ensure thread-safety. So, you won't
necessarily create a new instance every time.

Best,
Ismael

Paul Phillips

unread,
Apr 4, 2011, 4:46:07 PM4/4/11
to Ismael Juma, martin odersky, Seth Tisue, scala...@googlegroups.com
On 4/4/11 1:10 PM, Ismael Juma wrote:
> As far as I can see, it's a bit more complicated than that.
> Symbol.apply is called and that caches values in a WeakHashMap using a
> ReentrantReadWriteLock to ensure thread-safety. So, you won't
> necessarily create a new instance every time.

Oh you're right, Symbol isn't even a case class anymore. I guess I
didn't let an unhealthy obsession with the contents of the spec stop me
last time around.

I'm sure the code in there can be made faster. I was going for
"correct". (And it wasn't fast before either.) People whose thing is
"fast" are encouraged to improve it.

Ismael Juma

unread,
Apr 4, 2011, 5:02:35 PM4/4/11
to Paul Phillips, martin odersky, Seth Tisue, scala...@googlegroups.com
On Mon, Apr 4, 2011 at 9:46 PM, Paul Phillips <pa...@improving.org> wrote:
> I'm sure the code in there can be made faster.  I was going for "correct".
>  (And it wasn't fast before either.) People whose thing is "fast" are
> encouraged to improve it.

The way to make it faster is to use a ConcurrentWeakHashMap. The issue
is that doesn't exist in the standard library (Java or Scala). Google
Guava has what we need:

http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html

But it includes a bunch of stuff that we would not want. If I remember
correctly, you went for the correct and not particularly fast solution
because of this, it's a non-trivial amount of effort to do the right
thing here.

Best,
Ismael

Mike

unread,
Apr 4, 2011, 7:04:18 PM4/4/11
to martin odersky, scala...@googlegroups.com
martin odersky wrote:
> The ``problem'' here is that strings are interned by the VM, but
> symbols are not.

This contradicts _Programming in Scala_, 2nd ed., p. 80: "symbols are
interned". Do you mean that they aren't interned in current
implementations (but will be in the future)?

√iktor Ҡlang

unread,
Apr 4, 2011, 7:09:14 PM4/4/11
to Mike, martin odersky, scala...@googlegroups.com
No Mike, I think what Martin is saying is that the VM interns Strings, but the VM does not intern Symbols, so the Symbols needs to be interned by the Symbol implementation.

--
Viktor Klang,
Director of Research and Development

Ismael Juma

unread,
Apr 4, 2011, 7:09:40 PM4/4/11
to Mike, martin odersky, scala...@googlegroups.com
On Tue, Apr 5, 2011 at 12:04 AM, Mike <fort...@good-with-numbers.com> wrote:
> This contradicts _Programming in Scala_, 2nd ed., p. 80: "symbols are
> interned".  Do you mean that they aren't interned in current
> implementations (but will be in the future)?

Read the other messages in the thread or check the code in
Symbol.apply. They are interned.

Best,
Ismael

Mike

unread,
Apr 4, 2011, 8:55:17 PM4/4/11
to scala...@googlegroups.com
√iktor Ҡlang wrote:
> No Mike, I think what Martin is saying is that the VM interns Strings, but
> the VM does not intern Symbols, so the Symbols needs to be interned by the
> Symbol implementation.

Oh, the *VM*, right. Details. Thanks.

Ismael Juma

unread,
Apr 5, 2011, 4:14:27 AM4/5/11
to √iktor Ҡlang, Mike, martin odersky, scala...@googlegroups.com
2011/4/5 √iktor Ҡlang <viktor...@gmail.com>:

> No Mike, I think what Martin is saying is that the VM interns Strings, but
> the VM does not intern Symbols, so the Symbols needs to be interned by the
> Symbol implementation.

To be honest, I don't understand Martin's explanation. There should be
little difference for constant Symbols apart from classloading and GC.
Consider this:

object Strings {
val foo1 = "foo"
val foo2 = "foo"
val bar1 = "bar"
val bar2 = "bar"
}

object Symbols {
val fooSymbol1 = 'foo
val fooSymbol2 = 'foo
val barSymbol1 = 'bar
val barSymbol2 = 'bar
}

In the former case we have two strings interned by the JVM that have
their hashCode precomputed. In the second case, we have two symbol
instances created during class-loading with two strings interned by
the JVM with their hashCode pre-computed. The main difference are that
when creating the static Symbol (during classloading) we will acquire
a read-write lock and we will potentially create a WeakReference and
the Symbol wrapper over the String, but this will only be done once
for each Symbol while we hold a reference.

Also, Symbol does not define equals or hashCode. That means that
Object.hashCode is used for the latter. The implementation for that is
supposed to be fast and it is cached at the JVM-level after the
initial computation. However, I have not tested if it's slower than
reading a field (which is what happens for String.hashCode for
constants). If it is slower, then the solution would simply be to
delegate to String.hashCode if we care about the performance of
constant Symbols.

Best,
Ismael

Reply all
Reply to author
Forward
0 new messages