I am trying out symbols, thought as you said, I could have just as
easily done strings. I found that string were significantly faster
when used as map keys. Perhaps this is due to construction of symbols
and could be improved in the future.
The string version is about twice faster than the symbol version. Keep
in mind, this includes addition, set operator, and the map read/write
overhead. So i highly doubt symbol literals are statically
constructed, assuming that symbol's equals() and hashCode() are faster
than string's.
You are allowed to check you know. No penalty for phoning the repl.
scala> class A { val x = 'foo ; val y = 'foo ; val z = 'foo }
defined class A
scala> :javap -verbose A
// blah blah constant pool
{
private final scala.Symbol x;
private final scala.Symbol y;
private final scala.Symbol z;
private static final scala.Symbol symbol$1;
public static {};
Code:
Stack=2, Locals=0, Args_size=0
0: getstatic #11; //Field scala/Symbol$.MODULE$:Lscala/Symbol$;
3: ldc #14; //String foo
5: invokevirtual #18; //Method
scala/Symbol$.apply:(Ljava/lang/Object;)Ljava/lang/Object;
8: checkcast #20; //class scala/Symbol
11: putstatic #26; //Field symbol$1:Lscala/Symbol;
14: return
Looks pretttty static.
Lex> The string version is about twice faster than the symbol
Lex> version.
If you could track down the root cause, it sounds like it'd make a great
Trac ticket...
--
Seth Tisue | Northwestern University | http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
>>>>> "Lex" == Lex <lex...@gmail.com> writes:
Lex> The string version is about twice faster than the symbol
Lex> version.
If you could track down the root cause, it sounds like it'd make a great
Trac ticket...
Cheers
-- Martin
I believe this is a great idea. Unfortunately Symbol is defined with an
awful lot of specificity in the specification, so much so that it's
pretty much impossible to change it without a spec change.
> A symbol literal ’x is a shorthand for the expression scala.Symbol("x"). Symbol
> is a case class (§5.3.2), which is defined as follows.
> package scala final case class Symbol private (name: String) {
> override def toString: String = "’" + name
> }
If we could blur that a little bit so that
// this part is fine
'x is shorthand for the expression scala.Symbol("x")
// remove detail, along the lines of
... where Symbol is a class which is guaranteed to be reference equal
to any other Symbol created from an equal String
...then I could make the companion object be the concurrent hashmap, and
apply wrap around putIfAbsent.
As far as I can see, it's a bit more complicated than that.
Symbol.apply is called and that caches values in a WeakHashMap using a
ReentrantReadWriteLock to ensure thread-safety. So, you won't
necessarily create a new instance every time.
Best,
Ismael
Oh you're right, Symbol isn't even a case class anymore. I guess I
didn't let an unhealthy obsession with the contents of the spec stop me
last time around.
I'm sure the code in there can be made faster. I was going for
"correct". (And it wasn't fast before either.) People whose thing is
"fast" are encouraged to improve it.
The way to make it faster is to use a ConcurrentWeakHashMap. The issue
is that doesn't exist in the standard library (Java or Scala). Google
Guava has what we need:
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html
But it includes a bunch of stuff that we would not want. If I remember
correctly, you went for the correct and not particularly fast solution
because of this, it's a non-trivial amount of effort to do the right
thing here.
Best,
Ismael
This contradicts _Programming in Scala_, 2nd ed., p. 80: "symbols are
interned". Do you mean that they aren't interned in current
implementations (but will be in the future)?
Read the other messages in the thread or check the code in
Symbol.apply. They are interned.
Best,
Ismael
Oh, the *VM*, right. Details. Thanks.
To be honest, I don't understand Martin's explanation. There should be
little difference for constant Symbols apart from classloading and GC.
Consider this:
object Strings {
val foo1 = "foo"
val foo2 = "foo"
val bar1 = "bar"
val bar2 = "bar"
}
object Symbols {
val fooSymbol1 = 'foo
val fooSymbol2 = 'foo
val barSymbol1 = 'bar
val barSymbol2 = 'bar
}
In the former case we have two strings interned by the JVM that have
their hashCode precomputed. In the second case, we have two symbol
instances created during class-loading with two strings interned by
the JVM with their hashCode pre-computed. The main difference are that
when creating the static Symbol (during classloading) we will acquire
a read-write lock and we will potentially create a WeakReference and
the Symbol wrapper over the String, but this will only be done once
for each Symbol while we hold a reference.
Also, Symbol does not define equals or hashCode. That means that
Object.hashCode is used for the latter. The implementation for that is
supposed to be fast and it is cached at the JVM-level after the
initial computation. However, I have not tested if it's slower than
reading a field (which is what happens for String.hashCode for
constants). If it is slower, then the solution would simply be to
delegate to String.hashCode if we care about the performance of
constant Symbols.
Best,
Ismael