Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Matching parentheses with Regular Expressions

0 views
Skip to first unread message

James

unread,
Jul 3, 2008, 9:12:55 PM7/3/08
to
I`m trying to use regex to match/replace a word in parentheses.
The regular expression

private static final Pattern java_proc =
Pattern.compile("(java)");

does not work, because parentheses are treated as groupings.

Using "\" to designate the parentheses as literal characters does
not work --- not sure why:

private static final Pattern java_proc = Pattern.compile("\(java
\)");

I searched for and read a related post here, but it did not
help. I seem to be having a different problem than they. Or I just
don`t understand the post.

What am I doing wrong? Thanks, Alan

James

unread,
Jul 3, 2008, 9:23:19 PM7/3/08
to
OK, I finally found the words about using double slashes in front of
parentheses. So, now, why won`t the following regular expression
pattern compile?

private static final Pattern java_proc = Pattern.compile("\\\\.+\
\Process\\(java\\)\\");

The error says:

java.lang.ExceptionInInitializerError
Caused by: java.util.regex.PatternSyntaxException: Unknown character
property name {r} near index 6
\\.+\Process\(java\)\
^

This does not make sense to me.

I`m trying to match text of the form (example):

\\GOLLY\Process(java)\% Processor Time

Thanks, Alan

Message has been deleted

Joshua Cranmer

unread,
Jul 3, 2008, 9:31:52 PM7/3/08
to
James wrote:
> OK, I finally found the words about using double slashes in front of
> parentheses. So, now, why won`t the following regular expression
> pattern compile?
>
> private static final Pattern java_proc = Pattern.compile("\\\\.+\
> \Process\\(java\\)\\");
>
> The error says:
>
> java.lang.ExceptionInInitializerError
> Caused by: java.util.regex.PatternSyntaxException: Unknown character
> property name {r} near index 6
> \\.+\Process\(java\)\
> ^

This is what the regex is seeing. Don't forget that `\' is also a
metacharacter in regexes. So to match a '\' in regex requires you to use
'\\\\', which causes the regex to see '\\', which is what it uses to
match as a '\'. So the regex you're probably trying to compile:
"\\\\{2}.+\\\\Process\\(java\\)\\\\" (The {2} is so that you don't have
to type in 8 slashes)


--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

James

unread,
Jul 3, 2008, 9:44:50 PM7/3/08
to
Thank you.

I have one last remaining problem. The full data I`m working with,
in CSV format, looks like this:

"(PDH-CSV 4.0) (Eastern Daylight Time)(240)","\\GOLLY\Memory\%
Committed Bytes In Use","\\GOLLY\Process(java)\% Processor Time"

I want to match on

\\GOLLY\Process(java)\

so I can replace it.

The regular expression

\\\\{2}.+\\\\Process\\(java\\).

matches, but it matches too much of it:


\\GOLLY\Memory\% Committed Bytes In Use","\\GOLLY\Process(java)\

How can I get it to only match the part I want?

Thanks again, Alan

Joshua Cranmer

unread,
Jul 3, 2008, 9:52:10 PM7/3/08
to
James wrote:
> The regular expression
>
> \\\\{2}.+\\\\Process\\(java\\).
>
> matches, but it matches too much of it:

In that case, you probably want this regex:
\\\\{2}[^\\\\]+\\\\Process\\(java\\)

Arved Sandstrom

unread,
Jul 3, 2008, 10:22:18 PM7/3/08
to
"James" <jalan...@verizon.net> wrote in message
news:34ef4eab-fc10-4809...@27g2000hsf.googlegroups.com...

Double backslash your pattern: \\(java)\\

AHS


Roedy Green

unread,
Jul 4, 2008, 12:23:49 AM7/4/08
to
On Thu, 3 Jul 2008 18:12:55 -0700 (PDT), James
<jalan...@verizon.net> wrote, quoted or indirectly quoted someone
who said :

> private static final Pattern java_proc = Pattern.compile("\(java
>\)");

It gets complicated because you have both Java and regex escape
quoting.

See http://mindprod.com/jgloss/regex.html#QUOTING

--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

shakah

unread,
Jul 4, 2008, 8:04:16 AM7/4/08
to

FWIW, you could avoid a little of the backslash escape mess
by using single-char character classes, e.g.:
Pattern.compile("[\\]{2}[^\\]+[\\]Process[(]java[)]") ;
// ...outside of a Java string that'd be [\]{2}[^\]+
[\]Process[(]java[)]

Mark Space

unread,
Jul 4, 2008, 2:36:12 PM7/4/08
to

You also might get rid of some of those backslashes by substituting
another character, then using replace() on the string before compiling it.

final static String PATTERN = "``{2}.+``Process`(java`)";

String myRegex = PATTERN.replace("`", "\\" );
System.out.println( myRegex );

Result:

\\{2}.+\\Process\(java\)


It just makes things more readable. Using `, or %, or # in a string,
then replace that character with \'s before compiling it as a regex can
save your eyes.

Incidentally, I wonder if Sun could be convinced to add this themselves.
Maybe add a new operator/keyword altogether. Like # introduces new
keywords or operators. It's followed by the keyword or operator. This
just allows Sun to make new keywords or operators, with out breaking any
existing code. So #s might give us new string constatns. Let's say '
then means like a Unix shell string, where escaping is ignored.

String regex = #s'\\{2}.+\\Process\(java\)';

Would give that literal string, without the need to escape the
backslashes. Easier for regex at least. Other types of flags besides '
could be introduced too. `,$,@,%,= might do the same thing, just use a
different character as a string terminator, in case you want a ' to be
part of the string. """ might introduce a "here-is" operator. Etc.

Just thinking out loud....


Roedy Green

unread,
Jul 4, 2008, 2:50:01 PM7/4/08
to
On Fri, 04 Jul 2008 11:36:12 -0700, Mark Space
<mark...@sbc.global.net> wrote, quoted or indirectly quoted someone
who said :

>You also might get rid of some of those backslashes by substituting

>another character, then using replace() on the string before compiling it.

Other ideas:

1. Use Quoter to insert \ quoting, both for regex and Java strings.
see http://mindprod.com/applet/quoter.html

2. implement one or more of my regex student projects
http://mindprod.com/project/regexutility.html
http://mindprod.com/project/regexcomposer.html
http://mindprod.com/project/regexdebugger.html
http://mindprod.com/project/regexproofreader.html

3. use \Q ... \E

Mark Space

unread,
Jul 4, 2008, 3:05:48 PM7/4/08
to
Roedy Green wrote:

> 3. use \Q ... \E

OK, that's cool. It only works with regex, but it's darn handy for
them. Thanks!

James

unread,
Jul 5, 2008, 3:48:44 PM7/5/08
to
shakah,

The statement

Pattern JAVA_PROC = Pattern.compile("[\\]{2}[^\\]+[\


\]Process[(]java[)]");

compiles but raises an exception there:

run:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Unclosed character class near index 30
[\]{2}[^\]+[\]Process[(]java[)]
^

All: Thank you for your suggestions.

Roedy Green

unread,
Jul 5, 2008, 4:31:51 PM7/5/08
to
On Sat, 5 Jul 2008 12:48:44 -0700 (PDT), James
<jalan...@verizon.net> wrote, quoted or indirectly quoted someone
who said :

>[\]{2}[^\]+[\]Process[(]java[)]
> ^

() both need escapes. If that is a Java literal, you also need to
escape \ both for Java and for regex.

see http://mindprod.com/jgloss/regex.html#QUOTING

Joshua Cranmer

unread,
Jul 5, 2008, 4:34:18 PM7/5/08
to
James wrote:
> Exception in thread "main" java.util.regex.PatternSyntaxException:
> Unclosed character class near index 30
> [\]{2}[^\]+[\]Process[(]java[)]

You have to quote the slashes here still since the slashes are currently
quoting the close of the character class expression.

Roedy Green

unread,
Jul 6, 2008, 2:59:58 AM7/6/08
to
On Thu, 3 Jul 2008 18:12:55 -0700 (PDT), James

<jalan...@verizon.net> wrote, quoted or indirectly quoted someone
who said :

> I`m trying to use regex to match/replace a word in parentheses.
>The regular expression

An aside, you can't use a regex to tell if ( ) are nested and
balanced correctly to arbitrary depth.

For that you need a parser.

See http://mindprod.com/jgloss/parser.html

0 new messages