Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

escape meta for Pattern?

54 views
Skip to first unread message

Markus Dehmann

unread,
Feb 14, 2006, 4:56:26 PM2/14/06
to
On
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

it says:
> The string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.

Now, if I read my input strings from a file I have to convert my
strings in order to match them against a pattern. How do I do that?
Is there a predefined method to do it? Like quotemeta in perl?

Thanks!
Markus

Oliver Wong

unread,
Feb 14, 2006, 5:08:42 PM2/14/06
to

"Markus Dehmann" <markus....@gmail.com> wrote in message
news:1139954186.2...@z14g2000cwz.googlegroups.com...

There's several concepts you need to get straight here. One is "what is
the character-content of a String in memory?" which I will call "String A"
for short, and "What string do I have to type in my Java source code to get
String A into memory?" which I will call "String B".

So if you type a String B like "\\(hello\\)" then String A will be
"\(hello\)".

If you type a String B like "\t\\t\t", then String A will be something
like " \t ".

Now, let me define a new string called String C, as follows: "What does
my file have to contain so that when I read in that string, String A gets
loaded into memory?"

It turns out that String C and String A are exactly the same. If you
want " \t " to appear in memory, then your file should contain
" \t ". If you want "\(hello\)" to appear in memory, then your
file should contain "\(hello\)".

- Oliver

jamesa...@gmail.com

unread,
Feb 14, 2006, 9:06:13 PM2/14/06
to
Yes, the previous poster is completely correct.

I think it should be made absolutely clear that it is the Java compiler
that turns '\\' into '\'. Thus only string constants in code that will
be compiled, i.e. in source code, need to have the overabundance of
'\\'. Everywhere else (external files, memory images, etc), what you
see is what you get.

Roedy Green

unread,
Feb 14, 2006, 11:14:55 PM2/14/06
to
On 14 Feb 2006 18:06:13 -0800, jamesa...@gmail.com wrote, quoted or
indirectly quoted someone who said :

>I think it should be made absolutely clear that it is the Java compiler
>that turns '\\' into '\'. Thus only string constants in code that will
>be compiled, i.e. in source code, need to have the overabundance of
>'\\'. Everywhere else (external files, memory images, etc), what you
>see is what you get.

A SCID could hide this \ quoting goofiness by displaying and editing
strings in two colours, one for literal chars, and one for
representations of unprintable characters. Unicode has special glyphs
for the control chars you could use. Ditto for regex. We have the
hardware. We act as if had only TTYs to code on.
It would make proofreading 100 times easier. 40% of the difficultly of
regexes comes from the double layer of quoting.


See
http://mindprod.com/projects/regexcomposer.html
http://mindprod.com/projects/regexproofreader.html
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

John C. Bollinger

unread,
Feb 15, 2006, 9:14:51 PM2/15/06
to

As others have pointed out, pattern strings obtained by means other than
string literals are not bound by the constraints of string literals
(though they may have their own constraints). Another thing to
consider, though, is how to use a string -- from whatever source -- as a
literal pattern, handling erstwhile metacharacters as normal characters.
It isn't clear to me whether that's what you want, but if it is then
you should look into Pattern.quote().

--
John Bollinger
jobo...@indiana.edu

0 new messages