I ran across some code that was trying to validate that an integer was in a given range, however the integer and the range were Strings. The problem boils down to this:
...it seems like ('1'..'10').member?('2') should return true. The problem lies in range.c, in the range_each_func() method. This method starts with the first value, then calls succ() to get the next value, breaking out of the loop when the value is no longer less than or equal to the ending value (or strictly less than the ending value on an exclusive range). Unfortunately, for the given string range this happens immediately, since '2' > '10'.
I suppose that it could be argued that this is not a bug, but that would be a difficult argument to win. Also, I need to make sure that this is still a bug in the latest version of Ruby. Unfortunately, I'm too sleepy to investigate further or create a patch for this tonight, but I'll try to work on it some more tomorrow night (assuming nobody else fixes it first).
> I ran across some code that was trying to validate that an integer > was in a given range, however the integer and the range were Strings. > The problem boils down to this:
> ...it seems like ('1'..'10').member?('2') should return true. The > problem lies in range.c, in the range_each_func() method. This method > starts with the first value, then calls succ() to get the next value, > breaking out of the loop when the value is no longer less than or equal > to the ending value (or strictly less than the ending value on an > exclusive range). Unfortunately, for the given string range this > happens immediately, since '2' > '10'.
> I suppose that it could be argued that this is not a bug, but that > would be a difficult argument to win. Also, I need to make sure that > this is still a bug in the latest version of Ruby. Unfortunately, I'm > too sleepy to investigate further or create a patch for this tonight, > but I'll try to work on it some more tomorrow night (assuming nobody > else fixes it first).
> - Warren Brown
I'd argue that it is not a bug, as there is no unique isomorphie from strings to integers. Some well known functions would be hex, octal and decimal encoding. I.e. '2' ... '10' could be understood as 2 ... 8, or 2 ... 16 or 2 ... 10 or error ... 2 depending on the base.
Only you can now what the string means, so convert it to an integer and do the range test on integers.
So I think you should save your time on creating the patch and preferably fix the application code.
Brian Schröder wrote: > I'd argue that it is not a bug, as there is no unique isomorphie from > strings to integers. Some well known functions would be hex, octal and > decimal encoding. I.e. '2' ... '10' could be understood as 2 ... 8, or > 2 ... 16 or 2 ... 10 or error ... 2 depending on the base.
> Only you can now what the string means, so convert it to an integer > and do the range test on integers.
I was going to make that argument, but I realized that #each (by way of #succ) *does* have some extra knowledge (or assumptions) about strings, and it's pretty smart:
In message "Re: [BUG] string range membership" on Wed, 23 Nov 2005 15:41:32 +0900, "Warren Brown" <warrenbr...@aquire.com> writes: | I ran across some code that was trying to validate that an integer |was in a given range, however the integer and the range were Strings.
include? and member? compares with beg <= val <= end, which is dictionary order for strings. Unfortunately strings generated from using succ is not in dictionary order. I'm not sure how to solve this.
This was discussed sometime ago. The solution (mostly arrived at by Peter Vanbroekhoven) is to use a different comparision method. In Facets you'll find the #cmp method, which is part of the base methods, and which is used by the Interval class --a true Interval as opposed to what Range is.
def cmp(other) return -1 if length < other.length return 1 if length > other.length self <=> other end
Of course this won't be of use to tuple forms like "1.18.12", but in such cases a Tuple object is in order anyway.
> This shows a clear and unique mapping of the range '1'..'10' into a > set of strings.
but where do '01', '001', and '0001' go? they too, are in the set of strings.
regards.
-a -- =========================================================================== ==== | ara [dot] t [dot] howard [at] noaa [dot] gov | all happiness comes from the desire for others to be happy. all misery | comes from the desire for oneself to be happy. | -- bodhicaryavatara =========================================================================== ====
In message "Re: [BUG] string range membership" on Wed, 23 Nov 2005 23:57:41 +0900, "Warren Brown" <warrenbr...@aquire.com> writes:
|> I'd argue that it is not a bug, as there is no unique |> isomorphie from strings to integers. | | I have to disagree.
For your information, member? used to iterate over items to check membership. But since confusion between include? and member?, they were merged. The point is Ranges are used both for ranges and intervals. Sometimes users want it to behave like a range surrounded by begin/end values. Sometimes they want it to behave like a set of values, that #each produces.
I'd like to care this issue, but I haven't know the right way to solve it yet. Perhaps we should provide both membership method, with right names for each. Any ideas?
>> This shows a clear and unique mapping of the range >> '1'..'10' into a set of strings.
> but where do '01', '001', and '0001' go? they too, > are in the set of strings.
You completely lost me there. '01' doesn't *go* anywhere. That string is not in the range '1'..'10', in the same way the 'x' is not in the range 'a'..'n'.
Don't let the fact that my example used strings that look like numbers confuse the issue. The issue is that a range of strings that can be converted into a finite set, has a method to test for membership in that range, that doesn't match values that are in the set. Wow, that sentence is even hard for *me* to follow.
OK, let's take a different example to avoid all discussion of integers and various string representations of them.
> For your information, member? used to iterate over
? items to check membership. But since confusion
> between include? and member?, they were merged. The > point is Ranges are used both for ranges and > intervals. Sometimes users want it to behave like a > range surrounded by begin/end values. Sometimes they > want it to behave like a set of values, that #each > produces.
> I'd like to care this issue, but I haven't know the > right way to solve it yet. Perhaps we should > provide both membership method, with right names for > each. Any ideas?
Ah, I see. So really, the root problem here is the assumption by Range that (value < value.succ). And in String, this assumption does not always hold true:
irb(main):001:0> s = 'z' => "z" irb(main):002:0> s < s.succ => false
Because of that, there is a huge distinction between str_range.to_a.member?(x) (is x a member of the set of the range's values) and (str_range.first <= x <= str_range.last) (is x in the range's interval). So, given that (at least in the case of ranges of strings) there is a clear distinction between a value being included in the interval and a value being included in the set, it appears that we have a real need for two different methods. The methods Range#include? (in interval) and Range#member? (of set) seem to be perfect candidates for these two different functionalities. Before these two methods were merged, did they take on these two functionalities, or were they different in some other way?
Are there other cases where "membership" changes depending on whether the range is viewed as a set or an interval? If not, perhaps it would be better to address the fact that str.succ violates the (str < str.succ) assumption. Perhaps the functionality currently in String#succ could be moved to another method (String#increment perhaps?), and String#succ could take on a new functionality that does not violate (str < str.succ).
Anyway, please let me know if there is anything I can do to help settle this issue.
[
multipart_mixed_part < 1K ] This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools.
On Thu, 24 Nov 2005, Warren Brown wrote: >>> ruby -v -e "p(('1'..'10').to_a)" >>> ruby 1.8.2 (2004-12-25) [i386-mswin32] >>> ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
>>> This shows a clear and unique mapping of the range >>> '1'..'10' into a set of strings.
>> but where do '01', '001', and '0001' go? they too, >> are in the set of strings.
> You completely lost me there. '01' doesn't *go* anywhere. That > string is not in the range '1'..'10', in the same way the 'x' is not in > the range 'a'..'n'.
says who? ;-) i may chose to define String#succ to do whatever i like - including the values '01', '001', and '0001'.
my point is simply that you seem to be merging the notion of ranges and sets. the range abstract to_a is determined by only a few things
- the start and end points
- the succ method of the start value and each successive succ value remember one could do this
irb(main):003:0> class String; def succ; self == "1" ? 42 : super; end; end => nil irb(main):004:0> "1".succ => 42
- the spaceship operator for each succ value called against the endpoint
because of this we cannot even safely call to_a on an arbitrary range., for instance
irb(main):002:0> (42.0 .. 1.0).to_a TypeError: can't iterate from Float from (irb):2:in `each' from (irb):2:in `to_a' from (irb):2
in summary a range is nothing but a set of endpoints with some abstract/duck-type-like methods that may or may not produce a set as a __process__. note that the set produced is not part of the range itself and can be dynamically altered or even be made to produce a different set each time:
harp:~ > cat a.rb class Float def succ self + rand end end
> Don't let the fact that my example used strings that look like numbers > confuse the issue. The issue is that a range of strings that can be > converted into a finite set, has a method to test for membership in that > range, that doesn't match values that are in the set. Wow, that sentence > is even hard for *me* to follow.
> OK, let's take a different example to avoid all discussion of integers > and various string representations of them.
> Here we have a string range that has 27 "members". Now:
not quite - we have a string range that __produces__ 27 elements. it does not 'have' or 'contain' them. it merely suggests this set as it's current thought on what that set might be. this set definition may change - unlike the endpoints of the range - and it is therefore not a property of the range.
> Can this really be called correct behavior of the member?() method? I > can't see any tenable argument to say that it is.
the definition of membership may rely on endpoints only. that explains it perfectly.
harp:~ > irb irb(main):001:0> 'z' < 'aa' => false
ergo - not in the set. the confustion here is caused by exactly the reasons i'm explaining - String#succ has been defined not to create a monotonically increasing (<=>) sequence - but to produce the "next" string in an english sense. this is very useful for auto-generating names
irb(main):004:0> "z99".succ => "aa00"
if this were a monotonically increasing set the output would be
=> "z9:"
but that sure isn't that useful - unless you want to try to use ranges as sets.
the secret here is simply re-define String#succ - not Range#member. if String#succ did a simply addition using base 255 arith you'd be set.
kind regards.
-a -- =========================================================================== ==== | ara [dot] t [dot] howard [at] noaa [dot] gov | all happiness comes from the desire for others to be happy. all misery | comes from the desire for oneself to be happy. | -- bodhicaryavatara =========================================================================== ====
> I can't find this discussion in the archive. Can you give me a link > or a message number?
Largely from Ruby-talk 115120, although the solution really came about on the old suby-muse mailing list.
I'm not sure everyone understood me though. The problem is that String's #<=> and #succ methods are not compatible. Therefore Range#member? and Rage#include? which use #<=> can not provide proper results for String-based Ranges. The solution is to have Range use a different comparision method, namely #cmp. In most classes #cmp will of course just be an alias for #<=>, but it String is would differ to be compatible with #succ, and then #include and #member would be correct. Q.E.D.
It wont be as fast as Range#include, but you can't put the equivalent into Range without giving it knowledge about how String#succ works. ########################################################################### ########## This email has been scanned by MailMarshal, an email content filter. ########################################################################### ##########
In message "Re: [BUG] string range membership" on Thu, 24 Nov 2005 01:03:19 +0900, "Warren Brown" <warrenbr...@aquire.com> writes:
|So, given that (at least in the case of ranges of strings) there is a |clear distinction between a value being included in the interval and a |value being included in the set, it appears that we have a real need for |two different methods. The methods Range#include? (in interval) and |Range#member? (of set) seem to be perfect candidates for these two |different functionalities. Before these two methods were merged, did |they take on these two functionalities, or were they different in some |other way?
#include? used for range check, #member? was for set membership. But since they have same functionality in Enumerable, some claimed having different behaviors in Range was confusing. I agreed.
| Anyway, please let me know if there is anything I can do to help |settle this issue.
All we need is making up good names for each functionality.
On Thu, 24 Nov 2005, Yukihiro Matsumoto wrote: > Hi,
> In message "Re: [BUG] string range membership" > on Thu, 24 Nov 2005 01:03:19 +0900, "Warren Brown" <warrenbr...@aquire.com> writes:
> |So, given that (at least in the case of ranges of strings) there is a > |clear distinction between a value being included in the interval and a > |value being included in the set, it appears that we have a real need for > |two different methods. The methods Range#include? (in interval) and > |Range#member? (of set) seem to be perfect candidates for these two > |different functionalities. Before these two methods were merged, did > |they take on these two functionalities, or were they different in some > |other way?
> #include? used for range check, #member? was for set membership. But > since they have same functionality in Enumerable, some claimed having > different behaviors in Range was confusing. I agreed.
> | Anyway, please let me know if there is anything I can do to help > |settle this issue.
> All we need is making up good names for each functionality.
Range#contains?
??
-a -- =========================================================================== ==== | ara [dot] t [dot] howard [at] noaa [dot] gov | all happiness comes from the desire for others to be happy. all misery | comes from the desire for oneself to be happy. | -- bodhicaryavatara =========================================================================== ====
> All we need is making up good names for each functionality.
That is NOT all you need! This does not solve the complete problem, but only provides a little-bitty patch for query on a Range member, and a very inefficient one at that --which I thought was part of the reason you changed #include and #member to be the same in the first place.
The overarching issue is that sortable and comparable are using the same method #<=>, but they do not neccessarily want the same meaning. You should provide a separate method for comparable --like I said, in most cases they will be equivalent, but not so in String. And dictionary order comparion is needed anyway. I studied this issue exahustively over a year ago when I wrote a true Interval class.
On Thu, 24 Nov 2005, Yukihiro Matsumoto wrote: > Hi,
> In message "Re: [BUG] string range membership" > on Thu, 24 Nov 2005 09:38:11 +0900, "Ara.T.Howard" <ara.t.how...@noaa.gov> writes:
> |> All we need is making up good names for each functionality. > | > | Range#contains? > | > |??
> For which functionality?
well, i would think of #member? as most natural for set membership - so #contains? would/should be most like #include? - in my mind.
harp:~ > cat a.rb module Enumerable def contains? value map.include? value end end
r = "a" .. "aa" p r.contains?("z")
harp:~ > ruby a.rb true
so, if each would 'hit' it - it's contained.
kind regards.
-a -- =========================================================================== ==== | ara [dot] t [dot] howard [at] noaa [dot] gov | all happiness comes from the desire for others to be happy. all misery | comes from the desire for oneself to be happy. | -- bodhicaryavatara =========================================================================== ====
In message "Re: string range membership" on Thu, 24 Nov 2005 11:07:26 +0900, "Trans" <transf...@gmail.com> writes:
|> All we need is making up good names for each functionality. | |That is NOT all you need! This does not solve the complete problem, but |only provides a little-bitty patch for query on a Range member, and a |very inefficient one at that --which I thought was part of the reason |you changed #include and #member to be the same in the first place.
Depends on how you define problem.
|The overarching issue is that sortable and comparable are using the |same method #<=>, but they do not neccessarily want the same meaning. |You should provide a separate method for comparable --like I said, in |most cases they will be equivalent, but not so in String. And |dictionary order comparion is needed anyway. I studied this issue |exahustively over a year ago when I wrote a true Interval class.
I'm not sure what you meant here. Range has no relation with sorting. Can you elaborate?
> I'm not sure what you meant here. Range has no relation with > sorting. Can you elaborate?
#succ defines a sort order of sorts (pun intended ;-). But #<=> defines a sort order too along with comparability. In most classes there's no problem, but in String the two come into conflict --the orders are not the same.
Then consider that Range is not a true interval because it uses #succ. This is why I created a true Interval class that uses #+ instead. Likewise Range shouldn't use #<=> either, but another method, lets call it #cmp. This would fix the problem.
In general:
module Comparable def cmp(o) self<=>o end end
That is to say, for anything comparable #cmp is the same as #<=>, unless otherwise defined. (Alternately you could define #cmp as an alias of #<=> directly in the classes it is needed --that would probably be better.) Then in String define #cmp specially to confom to the successive order as defined by #succ.
Thus having Range use #cmp instead of #<=> the issue is solved.
In summary, an object would then be "Rangeable" if it supports #succ, but only fully so if is also supports #cmp too (instead of #<=>).
Does it make sense now? (Sorry if I'm not explaining well, it's a tad subtle and it's been awhile since I worked on it too, so I have been trying to recall it all myself too).
Ara.T.Howard wrote: > On Thu, 24 Nov 2005, Yukihiro Matsumoto wrote:
> > All we need is making up good names for each functionality.
> Range#contains?
> ??
I'm sure there was an earlier post with an excellent synopsis on ranges which stated that a Range /doesn't/ "contain"? Yeh, here it is ... from you ;))
"not quite - we have a string range that __produces__ 27 elements. it does not 'have' or 'contain' them. it merely suggests this set as it's current thought"
(SCNR ;)
Would something like Range#covers? be more apt? (meaning within the bounds). Oh, pooh, that's got an 's' on the end as well :(