Symbol vs String

Sebestyén Gábor

unread,

Mar 16, 2005, 3:37:12 PM3/16/05

to

Hi,

Just a dumb question: what is the real difference between { :aKey =>
"aValue" } and { "aKey" => "aValue" } ? I know the first key is a
symbol the latter is a string. I like string keys why should I use
symbols? Why symbols worth to use as keys?
Thanks,

Gábor

Malte Milatz

unread,

Mar 16, 2005, 3:53:06 PM3/16/05

to

Sebestyén Gábor:

> I like string keys why should I use symbols?

Because symbols
- are faster and
- save you one byte in your rb file.

Malte

Nikolai Weibull

unread,

Mar 16, 2005, 4:00:46 PM3/16/05

to

* Sebestyén Gábor (Mar 16, 2005 21:40):

Always use symbols for situations like these. The reason is that a
symbol is immutable and also that no new string needs to be created for
it if used more than once. Also, using strings as symbols and then
having the string altered will force a rehash of the table. It's all
about memory savings and execution speed,
nikolai

--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Eric Hodel

unread,

Mar 16, 2005, 4:03:22 PM3/16/05

to

Symbols take up less memory space (only allocated once for the same
Symbol) and have a faster #hash function (#object_id, not computed).

'x' == 'x' # => true
'x'.object_id == 'x'.object_id # => false

:x == :x # => true
:x.object_id == :x.object_id # => true

--
Eric Hodel - drb...@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

PGP.sig

Eric Hodel

unread,

Mar 16, 2005, 4:12:08 PM3/16/05

to

On 16 Mar 2005, at 13:00, Nikolai Weibull wrote:

> Also, using strings as symbols and then having the string altered will
> force a rehash of the table.

You mean this?

key = 'foo'

hash = {}

hash[key] = 5

key.gsub! /foo/, 'bar'

In this case, hash.rehash does not need to be called because Ruby
copies String hash keys:

hash.keys.first.object_id == key.object_id # => false

Also, String keys are frozen, so you can't modify them:

hash.keys.first.gsub! /foo/, 'bar' # => raises TypeError

PGP.sig

Nikolai Weibull

unread,

Mar 16, 2005, 4:26:58 PM3/16/05

to

* Eric Hodel (Mar 16, 2005 22:20):

> > Also, using strings as symbols and then having the string altered
> > will force a rehash of the table.

[basically saying that this isn't so]

OK, so this strengthens the argument for using symbols even further, as
keys will be copied. Thanks for pointing this out,

Robert Klemme

unread,

Mar 16, 2005, 4:31:00 PM3/16/05

to

"Nikolai Weibull" <mailing-lis...@rawuncut.elitemail.org> schrieb
im Newsbeitrag news:2005031621...@puritan.pcp.ath.cx...

>* Sebestyén Gábor (Mar 16, 2005 21:40):
>> Just a dumb question: what is the real difference between { :aKey =>
>> "aValue" } and { "aKey" => "aValue" } ? I know the first key is a
>> symbol the latter is a string. I like string keys why should I use
>> symbols? Why symbols worth to use as keys? Thanks,
>
> Always use symbols for situations like these. The reason is that a
> symbol is immutable and also that no new string needs to be created for
> it if used more than once. Also, using strings as symbols and then
> having the string altered will force a rehash of the table. It's all
> about memory savings and execution speed,

I rather make the distinction on the semantic level: for example, if you
write an initializer for a class that accepts a hash to init any number of
instance fields I'd prefer to use symbols here. Also, if there is only a
certain fixed set of values allowed. I use strings if they are read from
some source and I don't know beforehand, what they might be.

Incidentally it's typical for the key like things to occur rather often,
which fits nicely with the memory and speed savings incurred by symbols.

Kind regards

robert

Peter C. Verhage

unread,

Mar 16, 2005, 5:31:42 PM3/16/05

to

But why do Strings not behave like Symbols? I mean, why aren't all
Strings immutable? Is this because Symbols will never get garbage
collected (to make sure they can be used over and over again) and normal
Strings will? Which might mean that in some cases (lots of text
processing) immutable Strings would fill up memory?

Regards,

Peter

Nikolai Weibull

unread,

Mar 16, 2005, 6:04:02 PM3/16/05

to

* Peter C. Verhage (Mar 16, 2005 23:40):

Oh, no...not immutable vs. mutable strings again...

Well, if strings were immutable, then that would mean that strings could
share contents, and thus immutable strings wouldn't fill up memory. I
have suggested on the ruby-core list that Ruby should provide a second
data structure that acts like a string, namely the _rope_, and that it
be implemented in a way that allows for it to be used for tasks where
immutable "strings" are desired.

A rope is basically a string represented by a tree. Leafs of the tree
point to the subsequences of the whole string. These subsequences can
be shared with other ropes and can be generated lazily, i.e., from IO or
other generators. All that is needed is the length of the subsequence.
Every internal node keeps track of its own size and the size of its left
child. Thus, the offset of a node in the tree is the size of its left
child plus its ancestors. Ropes can be used to represent long strings
efficiently and many operations on ropes are O(1) where they are O(n) on
a string. This is offset by the fact that lookup in a rope is O(lg n)
versus O(1) for a string, but in many cases this isn't a problem.

Anyway, the rope data structure is further described in [1]. Boehm has
actually implemented this in C for his garbage collector, so see that
package for an example implementation (not though that it uses a lot of
C-hacks which makes it undesirable to use as-is). There's also a rope
data structure in STL, but it's limited to only using ropes and strings,
not IO,
nikolai (the rope and piece table lover)

[1] Hans-J Boehm, "Ropes: an Alternative to Strings", Software--Practice
and Experience, vol. 25(12), 1315--1330, Dec. 1995. Available at
http://rubyurl.com/2FRbO.

Hal Fulton

unread,

Mar 16, 2005, 7:39:35 PM3/16/05

to

Some people (such as Guido) dislike mutable strings.
Others (such as Matz, and incidentally me) like them.

Personally, my limited Java experience juggling String and
StringBuffer was enough to convince me that strings should
be mutable.

Hal

Douglas Livingstone

unread,

Mar 16, 2005, 8:09:55 PM3/16/05

to

On Thu, 17 Mar 2005 05:37:12 +0900, Sebestyén Gábor <seg...@chello.hu> wrote:
> I like string keys

Why?

Personally I think :symbols are great, makes it much clearer when you
are reading code that you are representing something else, rather than
storing a piece of data. And you can use them without having to define
them as constants before hand. Great :)

Faster to type too.

Douglas

Jim Weirich

unread,

Mar 16, 2005, 9:08:59 PM3/16/05

to

Use Strings for their content. Use Symbols for their arbitrary uniqueness.

--
-- Jim Weirich j...@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Sam Roberts

unread,

Mar 16, 2005, 9:29:14 PM3/16/05

to

Quoting j...@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM +0900:
> On Wednesday 16 March 2005 03:37 pm, Sebestyén Gábor wrote:
> > Just a dumb question: what is the real difference between { :aKey =>
> > "aValue" } and { "aKey" => "aValue" } ? I know the first key is a
> > symbol the latter is a string. I like string keys why should I use
> > symbols? Why symbols worth to use as keys?
>
> Use Strings for their content. Use Symbols for their arbitrary uniqueness.

I used to do this, but ran into problems.

Symbols are great for things related to ruby becuase the :bar form for symbol
literals accepts the same kind of chars as ruby identifiers. I use them be
preference in interacting with ruby's meta-programming APIs.

They start to fall down outside of this. For example, I tried to use
with mime types:

:text
==>:text
:video
==>:video
:octet-stream
NameError: undefined local variable or method `stream' for main:Object
from (irb):3
'octet-stream'.intern
==>:"octet-stream"

You CAN use them for things outside of the domain of ruby names, but it gets
painful if the names of those things are arbitarily unique, but have "-"
characters in their name, you first have to create a String!

You can get around this by creating constants:

OCTETSTREAM = 'octet-stream'.intern
TEXT = :text

etc., but that might not fit your API goals very well.

Anyhow, I moved back to using strings instead of symbols. The need to create a
string and intern it for things that are logically symbols but have a "-" in
them was too painful.

That was my experience, anyhow.

Cheers,
Sam

Hal Fulton

unread,

Mar 16, 2005, 10:02:19 PM3/16/05

to

I believe you can do things like :"octet-stream" -- but I grant
that is not much better.

Hal

Jim Weirich

unread,

Mar 16, 2005, 11:36:39 PM3/16/05

to

On Wednesday 16 March 2005 09:29 pm, Sam Roberts wrote:
> Quoting j...@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM +0900:
> > Use Strings for their content. Use Symbols for their arbitrary
> > uniqueness.
>
> I used to do this, but ran into problems.

[...]

> :octet-stream
>
> NameError: undefined local variable or method `stream' for main:Object
> from (irb):3
> 'octet-stream'.intern
> ==>:"octet-stream"

Why couldn't you do :octet_stream ? If your answer is because the dash comes
from outside ruby, then I would suggest that the content ("_" vs "-") is
important ... indicating that you should use strings.

Florian Gross

unread,

Mar 17, 2005, 5:16:16 PM3/17/05

to

Hal Fulton wrote:

> Sam Roberts wrote:
>> [...]

>> Anyhow, I moved back to using strings instead of symbols. The need to
>> create a
>> string and intern it for things that are logically symbols but have a
>> "-" in
>> them was too painful.
>>
>> That was my experience, anyhow.
>
> I believe you can do things like :"octet-stream" -- but I grant
> that is not much better.

And there's also the %s(octet-stream) family.

Sam Roberts

unread,

Mar 18, 2005, 9:34:06 AM3/18/05

to

Quoting hal...@hypermetrics.com, on Thu, Mar 17, 2005 at 12:02:19PM +0900:

> Sam Roberts wrote:
> > OCTETSTREAM = 'octet-stream'.intern
> > TEXT = :text
>

> I believe you can do things like :"octet-stream" -- but I grant
> that is not much better.

But a little better, I didn't know that, thanks.

Sam

Sam Roberts

unread,

Mar 18, 2005, 9:36:19 AM3/18/05

to

Quoting j...@weirichhouse.org, on Thu, Mar 17, 2005 at 01:36:39PM +0900:
> On Wednesday 16 March 2005 09:29 pm, Sam Roberts wrote:
> > Quoting j...@weirichhouse.org, on Thu, Mar 17, 2005 at 11:08:59AM +0900:
> > > Use Strings for their content. Use Symbols for their arbitrary
> > > uniqueness.
> >
> > I used to do this, but ran into problems.
> [...]
> > :octet-stream
> >
> > NameError: undefined local variable or method `stream' for main:Object
> > from (irb):3
> > 'octet-stream'.intern
> > ==>:"octet-stream"
>
> Why couldn't you do :octet_stream ? If your answer is because the dash comes
> from outside ruby, then I would suggest that the content ("_" vs "-") is
> important ... indicating that you should use strings.

Maybe I don't know what you mean by "arbitrarily unique".

"_" vs "-" is no more (or less) important than "a" vs. "z".

Cheers,
Sam

Jim Weirich

unread,

Mar 18, 2005, 10:50:38 AM3/18/05

to

If the choice if symbol names is arbitrary, then I can change the name of
the symbol everywhere that references it without changing the semantics of
the program.

For example, if any of the following choices are equally valid:
:octetstream, :OctetStream, :octet_stream, :stream_of_octets, :octets,
:fido, then the choice of name is arbitrary. Of course, some choices are
more transparent and convey meaning better, but the program will still
work even if we call the symbol :xyzzy. That's what it means to be
arbitrary.

If the choice of letters is constrained by some outside force, then it is
not arbitrary. For example, it might come to you as an attribute in an
XML message. Or perhaps you need to write it to a file, and other
programs expect that exact sequence of strings. In all these cases, the
content (sequence of letters) is important and cannot be changed without
breaking the program. When the content of the item is important, use a
string.

Sam Roberts

unread,

Mar 18, 2005, 12:31:58 PM3/18/05

to

Ah. Then, no, its not really arbitrary. More specifically, I can make it
arbitrary, but then I might be forced to make it more and more
arbitrary! If I map:

x-mailer => :xmailer

Then somebody decides to make a header

xmailer

I have to map:

xmailer => :zz_xmailer

etc. I guess I could madk a mapping table, hashing strings to
symbols, but at this point symbols aren't making my code clearer or
easier to use.

In the example of mime types, I probably could use abitray symbols.
Anybody who decides to make a new mime type called application/octet_stream or
application/octet_stream given tha application/octet-stream is a
standard name deserves to be publically humiliated. So I could use
:octetstream, arbitrarily.

I just wanted to use symbols for the efficiency, and to emphasize their
uniqueness in terms of case-sensitivity, it seemed to fit, but for
serveral reasons I discovered it didn't.

Cheers,
Sam