A negative lookbehind assertion won't work because "variable length
lookbehind [is] not implemented" and there can be an arbitrary
number of backslashes preceding any occurrence of an escape
metacharacter (e.g., "n").
The Perl Cookbook gives C<s/\\n/\n/g;> for "[t]urning \ followed
by n into a real newline", but this could match in unintended
places, right?
Any suggestions?
--
Jim Monty
mo...@primenet.com
Tempe, Arizona USA
>I need to reliably match escape sequences in arbitrary text. A
>simple pattern such as, for example, C<m/\\n/g> won't work because
>it matches where the substring occurs but is not an escape sequence,
>as in the string "c:\\norton". In other words, a pair of backslashes
>is the escape sequence that represents a literal backslash and these
>escaped backslashes must be accounted for.
Right, you can't use a negative lookbehind, so you'll have to use
something like:
s/((?:[^\\]|^)(?:\\\\)*)\\n/$1\n/;
Yes, it's disgusting. But it's what I've found works.
--
MIDN 4/C PINYAN, NROTCURPI, US Naval Reserve ja...@pobox.com
http://www.pobox.com/~japhy/ http://pinyaj.stu.rpi.edu/
PerlMonth - An Online Perl Magazine http://www.perlmonth.com/
The Perl Archive - Articles, Forums, etc. http://www.perlarchive.com/
Since you're also collapsing double backslashes, I'd recommend doing
all the substitutions in one pass:
s/\\(.)/control_char($1)/eg;
..or, if you also want to catch hex or octal character codes:
s/\\(c.|x[0-9A-Fa-f]{1,2}|0[0-7]{0,3}|.)/control_char($1)/eg;
Writing the subroutine is left as an exercise - I'd recommend using a
hash for the single-character cases. If you wanted, you could extend
this to handle special escapes like \U and \L, and maybe even do your
own variable interpolation.
--
Ilmari Karonen - http://www.sci.fi/~iltzu/
Note: Please ignore the pseudonymous troll in this newsgroup.
One thing to be aware of: Perl interpolates double qouoted strings, but
not single quoted ones. So "\n" is a newline, while '\n' is just a
backslash followed by the letter 'n'. You wouldn't need the extra \ in
your example if you wrote it as 'c:\norton'
What this means is that your solution will depend on your
implementation. For single quoted strings, a negative character class
may help:
/[^\\]*\n/
But this could still have problems with stuff like '\\\\\n'
For double quoted strings, /\n/ is sufficient, because perl will handle
figuring out what is and isn't an escape. so ("c:\\norton" =~ /\n/) will
evaluate to false.
-mjc
>The Perl Cookbook gives C<s/\\n/\n/g;> for "[t]urning \ followed
>by n into a real newline", but this could match in unintended
>places, right?
It's unsolvable as you write it. That is why you normally have to escape
backslashes as well, in some way.
Here's a version as I like it:
%replace = ( n => "\n", t => "\t");
s/\\(.)/$replace{$1} || $1/ge;
This will remove all escaping backslashes, and replace the sequence by a
newline or a tab if the sequence is '\n" or '\t' respectively; or by the
escaped character. It is pretty much like backslash escaping in
double-quotish context in Perl, but for all characters.
Alternatively:
%replace = ( n => "\n", t => "\t", "\\" => "\\");
s/\\([nt\\])/$replace{$1}/g;
is more single-quotish like: only those sequences for which the second
character is in the specified character class, and which also must be a
key for the substitution hash, are replaced.
--
Bart.