Regular expressions are one of those things that I understand but don't use often enough to have really mastered. I use them even less in my Java programming. Today I found myself banging my head into my cubicle wall trying to parse out a file that looked something like this:
----START
something=foo
anotherthing=bar
----END
----START
something=baz
anotherthing=ran outta foo words
----END
I needed to parse this out and create objects representing each section. Initially I started trying to read this in line by line, keeping track of where I was in relation to the markers, concatenating string buffers, etc. Then I realized how retarded that approach was, and how using a regular expression would make it a lot simpler.
Brushing aside the mental cobwebs I looked up a couple of references of the Java API and studied up on my friends
Pattern and
Matcher. I had trouble finding any examples for my specific case, where what I wanted to match spanned multiple lines. At first I thought I'd found the answer with the promising sounding
Pattern.MULTILINE argument to
Pattern.compile(), but that has to do with matching
^ and
$. Without that option, those operators only match at the beginning or end of the text being parsed, with them it will allow them to work within the text at newline boundaries.
Turned out what I was looking for was the
Pattern.DOTALL argument. By default, the dot operator does not match newlines, with this argument it does. An alternative is to prefix the regex pattern with
(?s). It has the same effect, and the mnemonic stands for "single-line" mode, which is what its called in Perl.
So, to extract the relevant sections from the input above, you can do the following:
Pattern pattern = Pattern.compile("----START\n(.*?)\n----END\n", Pattern.DOTALL);
Matcher matcher = pattern.matcher(theInput);
while(matcher.find()) {
System.out.println(matcher.group(1) + "\n");
}
That will print out:
something=foo
anotherthing=bar
something=baz
anotherthing=ran outta foo words
You can get the same result with the alternate method, using the embedded
(?s) operator in the
Pattern declration:
Pattern pattern = Pattern.compile("(?s)----START(.*?)----END");
For further reference, have a look
here.
--
Posted By eric to
ericasberry.com at 10/29/2008 07:17:00 PM