Author: Tom Wardrop Status: Open Priority: Normal Assignee: Category: Target version:
Assuming there's no technical limitation or ambiguities, I suggest that the shorthand syntax for symbol's in the context of an array, be applied to strings also.
E.g. {'key': 'value'}
I don't believe there are any syntax ambiguous that this would give rise to. The only consideration that may need to be made, is if there are plans to support shorthand syntax for quoted symbols, e.g. {'key': 'value'}. If quoted symbols are off the table, then there's no harm in implementing a shorthand hash syntax for strings. This may stem the growing problem of what I like to call 'symbolitis' , where symbol's are selected as the key type purely for their aesthetics and ease of use, even when strings are a more appropriate choice.
In message "Re: [ruby-core:36559] [Ruby 1.9 - Feature #4801][Open] Shorthand Hash Syntax for Strings" on Mon, 30 May 2011 09:44:35 +0900, Tom Wardrop <t...@tomwardrop.com> writes:
|Assuming there's no technical limitation or ambiguities, I suggest that the shorthand syntax for symbol's in the context of an array, be applied to strings also. | |E.g. {'key': 'value'}
Iff {'key': 'value'} means {:key => 'value'} I have no objection.
Symbols are commonly used as hash keys because of their efficiency more than anything else. And if you're writing hash literals, chances are there's no reason not to take advantage of the efficiency gains by using Symbols.
In message "Re: [ruby-core:36571] Re: [Ruby 1.9 - Feature #4801][Open] Shorthand Hash Syntax for Strings" on Mon, 30 May 2011 16:12:35 +0900, Anurag Priyam <anurag08pri...@gmail.com> writes:
|> Iff {'key': 'value'} means {:key => 'value'} I have no objection. | |Won't that be misleading? I think the OP wants {'key': 'value'} to |mean {'key' => 'value}.
I don't disagree here. But considering the fact that {key: "value"} is a shorthand for {:key => "value"}, {"key": "value"} should be a shorthand for {:"key" => "value"}. Besides that, since it reminds me JSON so much, making a: and "a": different could cause more confusion than the above misleading.
> |> Iff {'key': 'value'} means {:key => 'value'} I have no objection. > | > |Won't that be misleading? I think the OP wants {'key': 'value'} to > |mean {'key' => 'value}.
> I don't disagree here. But considering the fact that {key: "value"} > is a shorthand for {:key => "value"}, {"key": "value"} should be a > shorthand for {:"key" => "value"}. Besides that, since it reminds me > JSON so much, making a: and "a": different could cause more confusion > than the above misleading.
Right, that makes sense. Also we would then be able to say, {'foo-bar': 'something'} to mean {:'foo-bar' => 'something}, which I guess is not possible with the current shorthand syntax.
> Right, that makes sense. Also we would then be able to say, > {'foo-bar': 'something'} to mean {:'foo-bar' => 'something}, which I > guess is not possible with the current shorthand syntax.
On Mon, May 30, 2011 at 04:21:32PM +0900, Yukihiro Matsumoto wrote: > Hi,
> In message "Re: [ruby-core:36571] Re: [Ruby 1.9 - Feature #4801][Open] Shorthand Hash Syntax for Strings" > on Mon, 30 May 2011 16:12:35 +0900, Anurag Priyam <anurag08pri...@gmail.com> writes: > |Won't that be misleading? I think the OP wants {'key': 'value'} to > |mean {'key' => 'value}.
> I don't disagree here. But considering the fact that {key: "value"} > is a shorthand for {:key => "value"}, {"key": "value"} should be a > shorthand for {:"key" => "value"}. Besides that, since it reminds me > JSON so much, making a: and "a": different could cause more confusion > than the above misleading.
> matz.
Misleading, yes - but I think this is only a symptom of a bigger problem.
In Rails projects I can never be sure if the Hash object I am working accepts symbols (usually), strings (sometimes) or is a HashWithIndifferentAccess and doesn't matter until a plugin builds its own Hash from the contents.
The current Hash behavior is the most generic and flexible, but unfortunately both the most surprising for novices and least practical, IMHO.
My thought: warn when mixing string and symbol access in a Hash, because it is most likely an error. It becomes too easy to accidentally mix external data (String based hashes, though not always) with code/language constructs (usually Symbol based hashes).
With a warning, you won't guess the syntax wrong and not know immediately.
I am wondering: is having strings as keys instead of symbols in a Hash actually useful? Aside from obscure string/symbol optimization cases?
Another idea would be a Ruby SymbolHash, StringHash accessible from C API that raises when used incorrectly. And methods for conversion between the two would probably clean up a lot of code.
HashWithIndifferentAccess is cool, but it is only a workaround to reduce surprise, not to prevent the root cause of it.
With the new syntax we create an extra point of confusion for novices based on implementation details:
1. Will I get a Symbol or a String from this syntax?
2. Should the resulting Hash I pass to an API contain symbols or strings or any? What if it changes in the future? How can I get a warning or error if I get this wrong?
1) is a problem only because of 2).
Concentrating on getting both correct will cost more productivity than the flexibility would allow. The current behavior also keeps Hash from being more consistent with similar structures in other languages (such as JSON).
These are just thoughts about revising the current Hash behavior to reflect the changes in the syntax - for the same reasons as the syntax additions. I don't feel too competent in this area and I am probably missing a lot of things.
> On Mon, May 30, 2011 at 04:21:32PM +0900, Yukihiro Matsumoto wrote: >> Hi,
>> In message "Re: [ruby-core:36571] Re: [Ruby 1.9 - Feature #4801][Open] Shorthand Hash Syntax for Strings" >> on Mon, 30 May 2011 16:12:35 +0900, Anurag Priyam<anurag08pri...@gmail.com> writes: >> |Won't that be misleading? I think the OP wants {'key': 'value'} to >> |mean {'key' => 'value}.
>> I don't disagree here. But considering the fact that {key: "value"} >> is a shorthand for {:key => "value"}, {"key": "value"} should be a >> shorthand for {:"key" => "value"}. Besides that, since it reminds me >> JSON so much, making a: and "a": different could cause more confusion >> than the above misleading.
>> matz. > Misleading, yes - but I think this is only a symptom of a bigger > problem.
> In Rails projects I can never be sure if the Hash object I am working > accepts symbols (usually), strings (sometimes) or is a > HashWithIndifferentAccess and doesn't matter until a plugin builds its > own Hash from the contents.
> The current Hash behavior is the most generic and flexible, but > unfortunately both the most surprising for novices and least > practical, IMHO.
> My thought: warn when mixing string and symbol access in a Hash, > because it is most likely an error. It becomes too easy to > accidentally mix external data (String based hashes, though not > always) with code/language constructs (usually Symbol based hashes).
> With a warning, you won't guess the syntax wrong and not know > immediately.
> I am wondering: is having strings as keys instead of symbols in a Hash > actually useful? Aside from obscure string/symbol optimization cases?
> Another idea would be a Ruby SymbolHash, StringHash accessible > from C API that raises when used incorrectly. And methods for > conversion between the two would probably clean up a lot of code.
> HashWithIndifferentAccess is cool, but it is only a workaround to > reduce surprise, not to prevent the root cause of it. > ...
While on the subject, I really wished that hash constructor ({}) was an instance of HashWithIndiferentAccess from the beginning in Ruby. Actually, it should still be Hash, but should work like HashWithIndiferentAccess.
It is very misleading that my_hash[:something] != my_hash['something']. Also, I never found any useful on allowing that. Most of times an HashWithIndiferentAccess is what you really want.
Changing back to the subject (kind of), in Groovy, here is what happens:
a = [abc: 'some text']; a['abc'] == 'some text' a = [1: 'some text']; a.keySet()*.class == [java.lang.Integer]
In Ruby, {1: 'some text'} will raise an error. It would be great if it allowed numer indexing of hashes with the same syntax.
About {'abc': 'some text'}, it is ok to me to return {:'abc' => 'some text'}. But I think this should also be possible:
Since :"#{abc}" is allowed in Ruby, I imagine that any such substitute syntax would preserve that property.
I disagree strongly that Hash, the base class, should special-case the behaviors of Strings and Symbols to be equal. It's a hash table, like those encountered in any other language, and shouldn't behave unlike typical hash tables. Namely h[a] and h[b] look up the same value iff a == b (or a.eql?(b), or whichever equality test you use). Strings and symbols are never equal.
> Since :"#{abc}" is allowed in Ruby, I imagine that any such substitute syntax would preserve that property.
> I disagree strongly that Hash, the base class, should special-case the behaviors > of Strings and Symbols to be equal. It's a hash table, like those encountered in any other language, > and shouldn't behave unlike typical hash tables. Namely h[a] and h[b] look up the same > value iff a == b (or a.eql?(b), or whichever equality test you use). Strings and symbols > are never equal.
Maybe, if we introduced another equality operator for comparing strings and symbols, let's say ### (I know this is a terrible representation, but I don't care about it now - just the idea):
:"some-symbol" ### 'some-symbol' => true
Unless one of them is not a symbol, it would work as always:
123 ### '123' => false 123 ### 123 => true
Or maybe this operator should have a similar behavior for comparing numbers too. For instance, in Perl and other languages, if I remember correctly, hash[1] == hash['1'], and I think this is a good thing.
After defining how this operator should behave, than the hash implementation would be typical anyway.
I'm not proposing to change this in Ruby because of current existent written code that would probably be broken by such a change, but if this had been the decision from the beginning it would be fantastic!
I also disagree that ruby hashes should be with indifferent access. Even if it was since the beginning.
Also, the general impression that Rails accepts symbols and strings everywhere is false. If you try to give strings as keys to methods like has_many, it won't work. The only places where we accept both types are in params and cookies and we do that as a security measure. If we automatically converted such keys to symbols, someone could use it to cause a DoS attack.
Overall, I agree with the proposed syntax and with the behavior that it should return a symbol. For consistency, I also think interpolation should be allowed when double quotes are used.
> I also disagree that ruby hashes should be with indifferent access.
+1, they shouldn’t (IMHO). Hash keys can comprise any objects that have sane #hash and #eql? methods (stdlib’s Set is fundamentally based on this premise) and I don’t think any special casing should be made for Strings and Symbols (except for String freezing when they become keys).
> The only places where [Rails] accept[s] both types > are in params and cookies and we do that as a security > measure. If we automatically converted such keys to > symbols, someone could use it to cause a DoS attack.
Also, the above will at some point in time stop being a security concern (IIRC the DoS is MRI-only, right?).
> Overall, I agree with the proposed syntax and with the behavior > that it should return a symbol. For consistency, I also think > interpolation should be allowed when double quotes are used.
+1 – and thanks a lot for your great EuRuKo talk! :)
— Piotr Szotkowski -- My McDonald’s order was 28 and the next one was 29 and I thought, ‘Aha! Insecure predictable sequence numbers.’ [Mark Pilgrim]
On Mon, May 30, 2011 at 09:05:04PM +0900, Michael Edgar wrote: > Since :"#{abc}" is allowed in Ruby, I imagine that any such > substitute syntax would preserve that property.
> I disagree strongly that Hash, the base class, should special-case > the behaviors of Strings and Symbols to be equal. It's a hash table, > like those encountered in any other language, and shouldn't behave > unlike typical hash tables. Namely h[a] and h[b] look up the same > value iff a == b (or a.eql?(b), or whichever equality test you use). > Strings and symbols are never equal.
I though exactly the same thing, until I realized that having keys of different types in a Hash isn't really part of the general Hash concept. It is a side effect of Ruby being dynamically typed.
I agree and I wouldn't allow symbols to be equal to strings for keys. I would take the step further - they shouldn't be both used for keys in the same Hash - *because* they are two different types. Especially since they can easily represent one another.
Consider the following:
{ nil =>0, :foo => 1, 'foo' => 2 }
Conceptually, people expect Hash keys to be of the same type, except maybe for "hacks" like that nil above that can simplify code.
If someone out there in the world actually demands that such a Hash is valid and that :foo and 'foo' are different keys, you could always wrap Hash to support that for that single, specialized case.
Otherwise the whole world tries to use HWIA in all the wrong places as the silver bullet or write complex code to handle "strange" hashes gracefully. Or use HWIA just to symbolize the keys - "just in case".
In Ruby "foo" + 123 raises a TypeError. Adding a string key to a symbol-keyed Hash doesn't even show a warning.
I consider hashes with different key types different types of hashes, that shouldn't even be allowed to merge together without conversion. This could be useful both in Rails to make the meaning of each HWIA instance more explicit and for API designers to worry less about how to process hashes in a robust way.
I think the meaning of symbols and hashes are too similar for such different types to be allowed as keys in the same Hash instance.
Further more, if the standard Hash didn't allow strings for keys (another class for current behavior?), the new shorthand syntax would be even less surprising.
Symbols are recommended in favor of Strings for hashes anyway.
> Symbols are recommended in favor of Strings for hashes anyway.
Only for fixed key sets. Symbols aren't GCd, so if the set of keys for a Hash grows with respect to input, then forcing them all to symbols will grow your Ruby process's memory usage irreversibly.
On Mon, May 30, 2011 at 11:24:29PM +0900, Michael Edgar wrote: > On May 30, 2011, at 10:19 AM, Cezary wrote:
> > Symbols are recommended in favor of Strings for hashes anyway.
> Only for fixed key sets. Symbols aren't GCd, so if the set of keys for a Hash > grows with respect to input, then forcing them all to symbols will grow your > Ruby process's memory usage irreversibly.
Good point.
Because of this, I think it makes ever more sense to have a specialized Hash for string-keying - the each Hash "type" would have entirely different applications anyway.
Also, any documentation for conversion between such Hash "types" could warn about costs and point to alternatives.
> I though exactly the same thing, until I realized > that having keys of different types in a Hash > isn't really part of the general Hash concept.
Why? [citation needed]
> Consider the following: > { nil => 0, :foo => 1, 'foo' => 2 } > Conceptually, people expect Hash keys to be of the same type, > except maybe for "hacks" like that nil above that can simplify code.
Well, they either do or don’t, then. :)
> If someone out there in the world actually demands that such a Hash > is valid and that :foo and 'foo' are different keys, you could always > wrap Hash to support that for that single, specialized case.
Hm, IMHO ‘any object can be a key, just as any object can be a value’ is the general case, and ‘I want my Strings and Symbols to be treated the same when they’re similar, oh, and maybe with the nil handled separately for convenience’ is the specialised case.
> In Ruby "foo" + 123 raises a TypeError. Adding a string > key to a symbol-keyed Hash doesn't even show a warning.
I don’t see why it should – as long as it still responds to #hash and #eql?, it’s a valid Hash key.
Hashes in Ruby serve a lot of purposes (they even maintain insertion order); if you want to limit their functionality, feel free to subclass.
> I consider hashes with different key types different types of hashes, > that shouldn't even be allowed to merge together without conversion.
There’s nothing preventing you from subclassing Hash to create StringKeyHash, SymbolKeyHash or even MonoKeyHash that would limit the keys’ class to the first one defined.
How would you treat subclasses? Let’s say I have a Hash with keys being instances of People, Employees and Volunteers (with Employees ans Volunteers being subclasses of People). Should they all be allowed as keys in a single MonoKeyHash or not?
What about String-only keys, but with different keys having their own different singleton methods?
(For discussion’s sake: what about if a couple of the Strings had redefined #hash and #eql? methods, on an instance level?)
> I think the meaning of symbols and hashes are too similar for such > different types to be allowed as keys in the same Hash instance.
But that would introduce a huge exception in the current very simple model. Ruby is complicated enough; IMHO we should strive to make it less complicated, not more.
— Piotr Szotkowski -- // sometimes I believe compiler ignores all my comments
On Tue, May 31, 2011 at 05:55:39AM +0900, Piotr Szotkowski wrote: > Cezary:
First of all, thanks Piotr for taking the time to discuss this. My original ideas for solving the problem or their descriptions sucked, but I left your comments because they still apply or provide good examples.
I'm trying to get an idea of how the implementation decisions behind hashes affect the general use of hashes in Ruby and if something could be slightly changed in favor improving the user's experience with the language without too much sacrifice in other areas.
I believe Hash was designed with efficiency and speed in mind and the recent Hash syntax changes suggest that all the current ways people use Hash in Ruby is way beyond scope of the original concept.
Refinements may minimize the need for changes here, but even still, I think this is a good time to consider what Hash is used for and how syntax changes can help users better express their ideas instead of just being able to choose only between an array, a very, very general associative array or 3rd party gems that have no syntax support.
I hope I am not going overboard with this topic. I have serious doubts that the slight changes in Hash behavior presented won't cause problems, but I cannot think of any serious downsides, especially if only a warning is emitted. And with such a usability upside, I must be missing a big flaw in the idea or a big gain from the current behavior.
If this topic does not contribute to Ruby from the user's perspective I am ready to drop the subject entirely.
> > I though exactly the same thing, until I realized > > that having keys of different types in a Hash > > isn't really part of the general Hash concept.
> Why? [citation needed]
My wording isn't correct.
First, a Hash in ruby is an associative array that I read about here:
"From the perspective of a computer programmer, an associative array can be viewed as a generalization of an array. While a regular array maps an integer key (index) to a value of arbitrary data type, an associative array's keys can also be arbitrarily typed. In some programming languages, such as Python, the keys of an associative array do not even need to be of the same type."
The type of the key can be anything. Keys can even be different types with a single instance. The latter is not a requirement of every possible associative array implementation and this is what I meant.
It can be implementation specific, for example - an rbtree requires ordering of keys. In this specific case, you cannot have a symbol and string in such an associative array, because you cannot compare them.
But since Hash uses a hash table, it is possible to have a wider range of key types, including both symbol and string together. The implementation allows it, but my question is: is it *that* useful in the real world? Or does it cause more harm than good?
> > { nil => 0, :foo => 1, 'foo' => 2 }
> > Conceptually, people expect Hash keys to be of the same type, > > except maybe for "hacks" like that nil above that can simplify code.
> Well, they either do or don’t, then. :)
Right. What I wrote isn't correct. I think people expect hash keys to match a given domain to consider them valid. Just like every variable should have a value within bounds or raise at the first possible opportunity. Unless the cause of a problem is otherwise trivial to find and fix.
I don't recommend the example with nil above. Better alternatives IMHO:
{ :'' => 0, :foo => 1 }[ some_key || :'' ]
or
{ :foo => 1 }[some_key] || 0
or set the default in Hash
Hash.new(0).merge( :foo => 1 )[some_key]
That is why I called it a hack - using a Hash key to get default values.
> Hm, IMHO ‘any object can be a key, just as any object can be > a value’ is the general case, and ‘I want my Strings and Symbols > to be treated the same when they’re similar, oh, and maybe with > the nil handled separately for convenience’ is the specialised case.
Exactly. The specialized case is obviously bad. But the general case turned out not to be too great. I am thinking about third solution: generic, but within a specified domain - ideally were the differences between string and symbol stop them from unintentionally being in the same Hash without being too specialized. And without subclassing.
Even by just a warning that is emitted when a Hash becomes unsortable, we are not breaking the association array concept while *still* supporting 99% or more actual real world use cases. And not making any type-specific assumptions you presented.
As a side effect, if a user writes {'foo': 123}.merge('foo' => 456), they will get a warning instead of just a hash with two pairs.
Such a warning most likely will help find design flaws and make difficult to debug errors less often when refactoring. And hopefully encourage a better design or just think a little more about the current one.
> > In Ruby "foo" + 123 raises a TypeError. Adding a string > > key to a symbol-keyed Hash doesn't even show a warning.
> I don’t see why it should – as long as it still > responds to #hash and #eql?, it’s a valid Hash key.
Both methods are specific to Ruby's association array's internals which uses a hash table. Users generally care only about their string->symbol problems until they realize that using strings for keys is generally not a good thing because of problems and debugging time.
Implementation wise I think Hash is great. However, the flexibility along with symbol/string similarities and more ingenious uses of Hash will probably cause only more problems over time.
Example:
Python doesn't have symbols and has named arguments. In Ruby we use a symbol keyed Hash to simulate the latter which is great, but if the hash is not symbol key based, there is no quick, standard way to handle that. Sure, you can ignore or raise or convert, but why handle something you should be able to prevent?
Ignoring keys you don't know seems like a good idea, but the result is not very helpful in debugging obscure error messages. And lets face it: most of the Ruby code people work on is not their own.
The only people who don't need to care are the experts who already have the right habits and understanding that allows them to avoid problems without too much thought. The rest have to learn the hard way.
> Hashes in Ruby serve a lot of purposes (they even maintain insertion > order); if you want to limit their functionality, feel free to subclass.
Why do I have to subclass Hash to get a useful named arguments equivalent in Ruby? Why would I want object instances for argument names? Why can't I choose *not* to have them in a simple way?
The overhead and effort required to maintain and use a subclass becomes a good enough reason to give up on writing robust code.
Which is probably what most rubists do.
We have RBTree and HashWithIndifferentAccess. Neither really helps in creating good APIs for many of the wrong reasons:
- HWIA is for Rails specific cases but is usually abused to avoid costly string/symbol mistakes
- RBTree is a gem most people don't know about and stick with Hash anyway. It adds an ordering requirement but that seems like a side effect. It was proposed to be added in Ruby 1.9, but I don't remember why it ultimately didn't
- the {} notation is too convenient to lose in the case of subclassing, especially when Hash is used for method parameters
- in practice, you can only use the subclass in your own code
> There’s nothing preventing you from subclassing Hash to > create StringKeyHash, SymbolKeyHash or even MonoKeyHash > that would limit the keys’ class to the first one defined.
I thought about that exactly to avoid subclassing: by having an alternative to the current Hash already as a standard Ruby collection.
But now it think the idea is too limiting to be practical. From the user's perspective, having Hash restrict its behavior the way RBTree does would save people a lot of grief.
If Hash changed its behavior in the way described, most of the existing code would work as usual. Manually replacing {} with a subclass in a large project is a waste of time. Hashes are used too often to even consider subclassing.
Consider regular expressions: you can specify options to a regexp, defining its behavior. Having the same for hashes could be cool:
{'a' => 3, :a => 3}/so # s = strict, o = ordered
As examples, we could also have:
r = uses RBTree for the Hash (and so implies 's')
i = indifferent access, but not recommended (actually, I personally wouldn't want this as an option)
> How would you treat subclasses? Let’s say I have a Hash with > keys being instances of People, Employees and Volunteers (with > Employees ans Volunteers being subclasses of People). Should > they all be allowed as keys in a single MonoKeyHash or not?
Good example of using a Hash to associate values with (even random) objects!
Since having keys orderable already answers the part about allowing into the Hash, I'll concentrate on the case where items are of different types.
How about an array of objects and a hash of object id's instead?
Or just use the results of #hash as the keys if it is about object contents. This makes your intention more explicit.
{ person1.hash => some_value, ... }
If you really need different types as a way of associating values with random objects, you could create a Hash of types and each type would have object instances:
Then you can use hash[some_key.class][some_key] for access if you *really* need the current behavior.
Not much harder to handle, but you have much more control over the hash contents. You probably need to know about used types in the structure anyway to handle its contents
...
>> Symbols are recommended in favor of Strings for hashes anyway.
> Only for fixed key sets. Symbols aren't GCd, so if the set of keys for a Hash > grows with respect to input, then forcing them all to symbols will grow your > Ruby process's memory usage irreversibly.
... except that inline Strings have a worse performance for #==, #hash and GC churn. A String used as inline or static nmenomic in real code is pinned down anyway and will not be GCed. An inline Symbol doesn't create garbage every time it's referenced, and is guaranteed to be the same object across methods.
Symbols are for nmenomics -- Strings are for sequences of characters.
> A String used as inline or static nmenomic in real code is pinned > down anyway and will not be GCed.
Yes it will. What's pinned down is a call to a constructor (with a pinned value, but there is a fixed and finite number of such values), so that every time that code executes, a new String is created, and that will get GC'd. Symbols avoid the overhead of constructing and GC'ing new Strings, but the problem Cezary is talking about is where code that creates Symbols dynamically from Strings, that can create a potentially unbounded number of Symbols, none of which can be GC'd.
Clifford Heath wrote: > ... the problem > Cezary is talking about is where code that creates Symbols dynamically > from Strings, that can create a potentially unbounded number of Symbols, > none of which can be GC'd.
Agreed.
If a concrete example would help, my objects are receiving RPC messages from untrusted clients, and I check for valid messages by Hash lookup.
Ideally the Hash keys would be Symbols, but if I then convert the untrusted messages to Symbols to perform the Hash lookup, I've opened my server to a memory leak DoS exploit.
(On a related note, the RPC protocol supports all Ruby data types including Symbol, and the untrusted message names actually *arrive* at the protocol level as Symbols; but by default, any Symbols are deserialized as Strings when they reach the remote, because of the same DoS potential.)
> Clifford Heath wrote: >> ... the problem >> Cezary is talking about is where code that creates Symbols dynamically >> from Strings, that can create a potentially unbounded number of Symbols, >> none of which can be GC'd.
> Agreed. > Ideally the Hash keys would be Symbols, but if I then > convert the untrusted messages to Symbols to perform the > Hash lookup, I've opened my server to a memory leak DoS > exploit.
> (On a related note, the RPC protocol supports all Ruby > data types including Symbol, and the untrusted message names > actually *arrive* at the protocol level as Symbols; but by > default, any Symbols are deserialized as Strings when they > reach the remote, because of the same DoS potential.)
Good point. However, if the internal symbol table used weak references to Symbol objects, all dynamic Symbols that are not pinned down by code could be garbage collected.
> Good point. However, if the internal symbol table used weak > references to Symbol objects, > all dynamic Symbols that are not pinned down by code could be > garbage collected.
You'd lose some performance though, because it makes the GC graph traversal bigger. Possibly you could activate Symbol sweeping infrequently, and only if Symbols represent a significant percentage of all objects. Not sure how such an option would play in the GC code however.
On Thu, Jun 02, 2011 at 01:47:30PM +0900, Clifford Heath wrote: > On 02/06/2011, at 1:29 PM, Kurt Stephens wrote: > >Good point. However, if the internal symbol table used weak references to > >Symbol objects, > >all dynamic Symbols that are not pinned down by code could be garbage > >collected.
> You'd lose some performance though, because it makes the GC graph > traversal bigger. Possibly you could activate Symbol sweeping > infrequently, and only if Symbols represent a significant percentage > of all objects. Not sure how such an option would play in the GC > code however.
I wonder if it could be possible to internally delay the conversion from string to symbol until it actually can save memory?