Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

surprise in sub

0 views
Skip to first unread message

matt neuburg

unread,
Apr 11, 2008, 12:15:45 PM4/11/08
to
irb(main):001:0> s = "\\\\"
=> "\\\\"
irb(main):002:0> s.length
=> 2
irb(main):003:0> s = "howdy".sub("howdy", s)
=> "\\"
irb(main):004:0> s.length
=> 1

So merely using a string as the second param of sub (the replacement
value) can cause that string to be altered.

Now, the documentation does "warn" that sequences \1, \2 etc. are valid
in the replacement string. This suggests that the replacement string is
processed before use; to be sure, it says nothing about "\\" explicitly,
but I do see of course that one must deal with "\\" in order to escape
the escaping. Furthermore, there's a "workaround", namely to write the
third line as follows:

s = "howdy".sub("howdy") {|x| s}

Still, I got seriously caught by this behavior and it was tricky to
track down. m.

--
matt neuburg, phd = ma...@tidbits.com, http://www.tidbits.com/matt/
Leopard - http://www.takecontrolbooks.com/leopard-customizing.html
AppleScript - http://www.amazon.com/gp/product/0596102119
Read TidBITS! It's free and smart. http://www.tidbits.com

x17y19

unread,
Apr 11, 2008, 2:12:02 PM4/11/08
to
On Apr 11, 9:15 am, m...@tidbits.com (matt neuburg) wrote:
> irb(main):001:0> s = "\\\\"
> => "\\\\"
> irb(main):002:0> s.length
> => 2
> irb(main):003:0> s = "howdy".sub("howdy", s)
> => "\\"
> irb(main):004:0> s.length
> => 1
>
> So merely using a string as the second param of sub (the replacement
> value) can cause that string to be altered.
>
> Now, the documentation does "warn" that sequences \1, \2 etc. are valid
> in the replacement string. This suggests that the replacement string is
> processed before use; to be sure, it says nothing about "\\" explicitly,
> but I do see of course that one must deal with "\\" in order to escape
> the escaping. Furthermore, there's a "workaround", namely to write the
> third line as follows:
>
> s = "howdy".sub("howdy") {|x| s}
>
> Still, I got seriously caught by this behavior and it was tricky to
> track down. m.
>
> --
> matt neuburg, phd = m...@tidbits.com,http://www.tidbits.com/matt/
> Leopard -http://www.takecontrolbooks.com/leopard-customizing.html
> AppleScript -http://www.amazon.com/gp/product/0596102119

> Read TidBITS! It's free and smart.http://www.tidbits.com

s is changing because you assigned to it, not because of using it
as the second parameter of sub(). Try assigning the result to a
different variable like so:

ss = "howdy".sub("howdy", s)

matt neuburg

unread,
Apr 11, 2008, 6:46:02 PM4/11/08
to

Arlen Cuss

unread,
Apr 11, 2008, 9:44:28 PM4/11/08
to
[Note: parts of this message were removed to make it a legal post.]

Hi,

On Sat, Apr 12, 2008 at 2:20 AM, matt neuburg <ma...@tidbits.com> wrote:

> irb(main):001:0> s = "\\\\"
> => "\\\\"
> irb(main):002:0> s.length
> => 2
> irb(main):003:0> s = "howdy".sub("howdy", s)
> => "\\"
> irb(main):004:0> s.length
> => 1
>

Yeah, escaping and escaping-of-escaping with substition and Strings being
used as a poor-man's-regexp always catches me out. Thanks for the heads up
on this one.

Arlen

Peña, Botp

unread,
Apr 12, 2008, 12:14:08 AM4/12/08
to
From: matt neuburg [mailto:ma...@tidbits.com]
# You're missing the point... m.

i think i missed the point too :)

you did mention: "So merely using a string as the second param of sub (the replacement value) can cause that string to be altered."...

kind regards -botp

Christopher Dicely

unread,
Apr 12, 2008, 10:25:43 AM4/12/08
to
On Fri, Apr 11, 2008 at 9:20 AM, matt neuburg <ma...@tidbits.com> wrote:
> irb(main):001:0> s = "\\\\"
> => "\\\\"
> irb(main):002:0> s.length
> => 2
> irb(main):003:0> s = "howdy".sub("howdy", s)
> => "\\"
> irb(main):004:0> s.length
> => 1
>
> So merely using a string as the second param of sub (the replacement
> value) can cause that string to be altered.

Nope, using the string (s) as the second parameter of sub did nothing to alter
it. This is clear if you use a different variable as the assignment target:


irb(main):001:0> s='\\\\'


=> "\\\\"
irb(main):002:0> s.length
=> 2

irb(main):003:0> foo = "howdy".sub("howdy",s)
=> "\\"
irb(main):004:0> s
=> "\\\\"
irb(main):005:0> s.length
=> 2
irb(main):006:0> foo
=> "\\"
irb(main):007:0> foo.length
=> 1

s isn't changed by being used as the second argument to sub, instead, the
string sent as the second argument to sub is processed for escape sequences
so that the substring '\\' occurring in that string is treated as a
single literal '\'
when used in the replacement.

But its not changed, as the above irb session shows. s is unmodified.

Todd Benson

unread,
Apr 12, 2008, 12:03:36 PM4/12/08
to

When I first read the post, I immediately wanted to strike out with a
"well, you're assigning" response.

I'm not sure, but I think the OP was referring to what you said;
namely, how the escaping happens before subbing.

Todd

matt neuburg

unread,
Apr 13, 2008, 9:31:01 AM4/13/08
to
Christopher Dicely <cmdi...@gmail.com> wrote:

> On Fri, Apr 11, 2008 at 9:20 AM, matt neuburg <ma...@tidbits.com> wrote:
> > irb(main):001:0> s = "\\\\"
> > => "\\\\"
> > irb(main):002:0> s.length
> > => 2
> > irb(main):003:0> s = "howdy".sub("howdy", s)
> > => "\\"
> > irb(main):004:0> s.length
> > => 1
> >
> > So merely using a string as the second param of sub (the replacement
> > value) can cause that string to be altered.
>
> Nope, using the string (s) as the second parameter of sub did nothing to alter
> it.

I didn't say that s was altered. I said that the string you provide as
the second param of sub might not be the string that gets substituted in
- as the example demonstrates. If you don't find this counterintuitive,
you don't; great. But some people might. Those are the people I'm trying
to help here. m.

Peña, Botp

unread,
Apr 14, 2008, 11:09:52 PM4/14/08
to
From: matt neuburg [mailto:ma...@tidbits.com]
# I didn't say that s was altered. I said that the string you
# provide as the second param of sub might not be the string
# that gets substituted in - as the example demonstrates. If
# you don't find this counterintuitive, you don't; great. But
# some people might. Those are the people I'm trying
# to help here. m.

i think the confusion stems fr the fact that sub/gsub has to reprocess/unescape the string twice

1 for the string as usual for possible escaping chars like \ and "

and

2 for the group references like \1

note that this behaviour is present in other languages too.

it's been a long time i have *not used the string(as 2nd param) form. I have been getting used to w the block form since not only does it handles the double escaping issue/confusion but it also caters the match vars $1, $`, $& among others..

so, this one eg

irb(main):019:0> "hello".gsub(/([aeiou])/, "<\\1>")
=> "h<e>ll<o>"

now becomes this

irb(main):020:0> "hello".gsub(/([aeiou])/) {|s| "<#{s}>"}
=> "h<e>ll<o>"

or this

irb(main):026:0> "hello".gsub(/([aeiou])/) {"<#$1>"}
=> "h<e>ll<o>"

your choice though.

kind regards -botp

0 new messages