Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Splitting a string with escapable separator?

10 views
Skip to first unread message

Michael Schuerig

unread,
Sep 27, 2005, 7:45:11 PM9/27/05
to

I'm trying to come up with an *elegant* way to split a string into an
array at a separator with the additional feature that the separators
can be escaped. It should work like this

"Hello\, World,Hi".split_escapable(',' '\')
# => ["Hello, World", "Hi"]

Through a number of permutations with regexps, scan and the rest of the
family, I was unable to find a solution. I could parse the given string
myself, going though it character by character, but I'd prefer a less
pedestrian approach.

Michael

--
Michael Schuerig All good people read good books
mailto:mic...@schuerig.de Now your conscience is clear
http://www.schuerig.de/michael/ --Tanita Tikaram, Twist In My Sobriety

Daniel Sheppard

unread,
Sep 27, 2005, 8:11:04 PM9/27/05
to
class String
def split_escapable(split_char, escape_char)
arr = []
split(split_char).each do |x|
if(arr[-1] && (/#{escape_char}$/ === arr[-1]))
arr[-1] << x
else
arr << x
end
end
arr
end
end

Michael


#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################


Daniel Sheppard

unread,
Sep 27, 2005, 8:25:25 PM9/27/05
to
Just because you've done something similar before, doesn't mean you
should trust yourself to code it from memory and get it right without
testing....

class String
def split_escapable(split_char, escape_char)
arr = []
split(split_char).each do |x|

if(arr[-1] && (arr[-1][-1].chr == escape_char))
arr[-1].chop!


arr[-1] << x
else
arr << x
end
end
arr
end
end

It's not very good, because you can't escape the backslash, but some
magic should sort that out, and depending on what you're doing, you
might not care. Also, it's not all that elegant, so maybe you already
got this kind of solution.

Also, you should note that here:

"Hello\, World,Hi".split_escapable(',' '\')
# => ["Hello, World", "Hi"]

Since you're using double-quotes, the backslash is already being
consumed as an escape character, and it won't compile because the
backslash in the single quotes needs to be escaped because it preceeds a
single quote. And also because you missed the comma between arguments.

'Hello\, World,Hi'.split_escapable(',','\\')


# => ["Hello, World", "Hi"]

works.

Gavin Kistner

unread,
Sep 28, 2005, 1:00:37 AM9/28/05
to
On Sep 27, 2005, at 5:46 PM, Michael Schuerig wrote:
> I'm trying to come up with an *elegant* way to split a string into an
> array at a separator with the additional feature that the separators
> can be escaped. It should work like this
>
> "Hello\, World,Hi".split_escapable(',' '\')
> # => ["Hello, World", "Hi"]
>
> Through a number of permutations with regexps, scan and the rest of
> the
> family, I was unable to find a solution.

Your above example is missing a couple of \, but I assume I know what
you meant.

Is the following elegant or not?

class String
def split_escapable( separator, escape_char=nil )
results = []
re = /(.+?)(?:#{escape_char ? "([^\\#{escape_char}])" : ''}#
{separator}|$)/
self.scan( re ){ |str,last_char|
results << str + last_char.to_s
}
results
end
end

p "Hello\\, World,Hi".split_escapable( ',', '\\' )


#=> ["Hello\\, World", "Hi"]

Note that the above does not account for the case of:
Hello \\,World
(where an escaped backslash is intended to end the first entry)
but if that was important, that's just a matter of a bit of odd/even
backslash counting.

Something like (untested):
re = /(.+?)(?:#{escape_char ? "([^\\#{escape_char}](\\#{escape_char}\
\#{escape_char})*)" : ''}#{separator}|$)/

William James

unread,
Sep 28, 2005, 6:24:45 AM9/28/05
to
Michael Schuerig wrote:
> I'm trying to come up with an *elegant* way to split a string into an
> array at a separator with the additional feature that the separators
> can be escaped. It should work like this
>
> "Hello\, World,Hi".split_escapable(',' '\')
> # => ["Hello, World", "Hi"]
>
> Through a number of permutations with regexps, scan and the rest of the
> family, I was unable to find a solution. I could parse the given string
> myself, going though it character by character, but I'd prefer a less
> pedestrian approach.


class String
def split_escapable( splitter, escaper )
escaper = escaper*2 if escaper=='\\'
re = %r{ \G
# Make sure at least 1 character remains.
(?= . )
(
(?:
[^#{ splitter }#{ escaper }]
|
(?: #{ escaper } . )
) *
)
(?:
#{ splitter }
|
\Z
)

}xm
scan( re ).map{|x| x.first.gsub( /#{escaper}(.)/, '\1' ) }
end
end

s = <<HERE
Hello@, World!,Hi.
Alarm rings@, lights flash.,One escaper @@
HERE
s.split("\n").each {|x|a=x.split_escapable(',','@');p a; puts a}
puts "----"

s = <<'HERE'
Hello\, World!,Hi.
Alarm rings\, lights flash.,One escaper \\
HERE
s.split("\n").each {|x|a=x.split_escapable(',','\\');p a; puts a}

Jason Sweat

unread,
Sep 28, 2005, 7:02:43 AM9/28/05
to
On 9/27/05, Michael Schuerig <mic...@schuerig.de> wrote:
>
> I'm trying to come up with an *elegant* way to split a string into an
> array at a separator with the additional feature that the separators
> can be escaped. It should work like this
>
> "Hello\, World,Hi".split_escapable(',' '\')
> # => ["Hello, World", "Hi"]
>
> Through a number of permutations with regexps, scan and the rest of the
> family, I was unable to find a solution. I could parse the given string
> myself, going though it character by character, but I'd prefer a less
> pedestrian approach.
>
> Michael

With the new Regex engine in cvs ruby you can use a negative lookback
assertion in your split:
>> s = "Hello\\, World, Hi"


=> "Hello\\, World, Hi"

>> s.split /(?<!\\),/


=> ["Hello\\, World", " Hi"]


$ ruby --v
ruby 1.9.0 (2005-09-08) [i686-linux]


Regards,
Jason
http://blog.casey-sweat.us/


Ben

unread,
Sep 28, 2005, 8:25:03 AM9/28/05
to
On 9/27/05, Daniel Sheppard <dan...@pronto.com.au> wrote:
> Just because you've done something similar before, doesn't mean you
> should trust yourself to code it from memory and get it right without
> testing....
>

I printed that and put it on my wall. I do that all the time.

-Ben


Michael Schuerig

unread,
Sep 28, 2005, 10:51:56 AM9/28/05
to
Jason Sweat wrote:

> On 9/27/05, Michael Schuerig <mic...@schuerig.de> wrote:
>>
>> I'm trying to come up with an *elegant* way to split a string into an
>> array at a separator with the additional feature that the separators
>> can be escaped. It should work like this
>>
>> "Hello\, World,Hi".split_escapable(',' '\')
>> # => ["Hello, World", "Hi"]
>>
>> Through a number of permutations with regexps, scan and the rest of
>> the family, I was unable to find a solution. I could parse the given
>> string myself, going though it character by character, but I'd prefer
>> a less pedestrian approach.
>>
>> Michael
>
> With the new Regex engine in cvs ruby you can use a negative lookback
> assertion in your split:
>>> s = "Hello\\, World, Hi"
> => "Hello\\, World, Hi"
>>> s.split /(?<!\\),/
> => ["Hello\\, World", " Hi"]

That must be the most elegant solution. Unfortunately I can't use cvs
ruby and can't wait for it either.

Michael

--
Michael Schuerig Airtight arguments have
mailto:mic...@schuerig.de vacuous conclusions.
http://www.schuerig.de/michael/ --A.O. Rorty, Explaining Emotions

Warren Brown

unread,
Sep 28, 2005, 10:58:56 AM9/28/05
to
Jason,

>> I'm trying to come up with an *elegant* way to split
>> a string into an array at a separator with the
>> additional feature that the separators can be
>> escaped.

> With the new Regex engine in cvs ruby you can use a


> negative lookback assertion in your split:
> >> s = "Hello\\, World, Hi"
> => "Hello\\, World, Hi"
> >> s.split /(?<!\\),/
> => ["Hello\\, World", " Hi"]

With the current Ruby RE engine, you can use zero-width positive
lookahead if you don't mind reversing the string before and after the
split:

irb(main):001:0> s = "Hello\\, World, Hi"


=> "Hello\\, World, Hi"

irb(main):002:0> s.reverse.split(/,(?!\\)/).map {|ss| ss.reverse}
=> [" Hi", "Hello\\, World"]

You might also consider handling escaped escape characters by
ignoring pairs of escape characters:

irb(main):003:0> s = "Test, Test\\, Test\\\\, Test\\\\\\, Test"
=> "Test, Test\\, Test\\\\, Test\\\\\\, Test"
irb(main):004:0> s.reverse.split(/,(?!(\\\\)*\\([^\\]|$))/).map {|ss|
ss.reverse}
=> [" Test\\\\\\, Test", " Test\\, Test\\\\", "Test"]

I hope this helps.

- Warren Brown

Michael Schuerig

unread,
Sep 28, 2005, 11:11:41 AM9/28/05
to
William James wrote:

> Michael Schuerig wrote:
>> I'm trying to come up with an *elegant* way to split a string into an
>> array at a separator with the additional feature that the separators
>> can be escaped.

[snip]

> class String
> def split_escapable( splitter, escaper )

[snip]
> end
> end

Thanks, that appears to work indeed, although I can't claim to
understand how or why.

Michae.

--
Michael Schuerig The more it stays the same,
mailto:mic...@schuerig.de The less it changes!
http://www.schuerig.de/michael/ --Spinal Tap, The Majesty of Rock

Michael Schuerig

unread,
Sep 28, 2005, 11:12:06 AM9/28/05
to
Gavin Kistner wrote:
[snip]

> Note that the above does not account for the case of:
> Hello \\,World
> (where an escaped backslash is intended to end the first entry)
> but if that was important, that's just a matter of a bit of odd/even
> backslash counting.

That's a thing I'd need. Opportunistically, I'll go with William's
suggestion from a sibling post.

> Something like (untested):
> re = /(.+?)(?:#{escape_char ? "([^\\#{escape_char}](\\#{escape_char}\
> \#{escape_char})*)" : ''}#{separator}|$)/

Thanks.

Michael

Gavin Kistner

unread,
Sep 28, 2005, 11:12:17 AM9/28/05
to
On Sep 28, 2005, at 8:58 AM, Warren Brown wrote:
> With the current Ruby RE engine, you can use zero-width positive
> lookahead if you don't mind reversing the string before and after the
> split:

LOL, what a creative solution! :) I'll have to remember that trick.
"To use lookbehinds in a regexp engine that doesn't have them,
reverse the string (and your thinking), then use a lookahead."


Han Holl

unread,
Sep 29, 2005, 6:18:35 AM9/29/05
to
On 9/28/05, Michael Schuerig <mic...@schuerig.de> wrote:
>
>
> I'm trying to come up with an *elegant* way to split a string into an
> array at a separator with the additional feature that the separators
> can be escaped. It should work like this
>
> "Hello\, World,Hi".split_escapable(',' '\')
> # => ["Hello, World", "Hi"]
>
>
None of the offerings so far had support for the second argument for split.
The following is probably quite efficent if the chance of an occurrence of
an escaped character is low.
(I haven't benchmarked anything, though). Also, it's not suitable for binary
strings.

class String
def split_escapable(separator, escape_char, *args)
istr = dup
impossible = "\x01"
replace = "#{escape_char}#{separator}"
changed = istr.gsub!(replace, impossible)
fields = istr.split(separator, *args)
if changed
fields.each do |f|
f.gsub!(impossible, separator)
end
end
fields
end
end
a = "Hello\\, World,Hi"
puts a.split_escapable( ',', '\\' )

Cheers,

Han Holl

email...@gmail.com

unread,
Oct 4, 2005, 10:56:01 AM10/4/05
to
Without 1.9 lookback, you could try:
'Hello\,World,Hi'.scan(/(?:\\,|[^,])+/).map {|e| e.tr('\\','')}
=> ["Hello,World", "Hi"]
0 new messages