Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Regex one-liner to find several multi-line blocks of text in a single file
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Thomas Smith  
View profile  
 More options Nov 1 2012, 4:00 am
Newsgroups: perl.beginners
From: theitsm...@gmail.com (Thomas Smith)
Date: Thu, 1 Nov 2012 00:44:08 -0700
Local: Thurs, Nov 1 2012 3:44 am
Subject: Regex one-liner to find several multi-line blocks of text in a single file
Hi,

I'm trying to search a file for several matching blocks of text. A sample
of what I'm searching through is below.

What I want to do is match "##### START block #####" through to the next
"##### END block #####" and repeat that throughout the file without
matching any of the text that falls between each matched block (that is,
the "ok: some text" lines should not be matched). Here is the one-liner I'm
using:

perl -p -e '/^##### START block #####.*##### END block #####$/s' file.txt

I've tried a few variations of this but with the same result--a match is
being made from the first "##### START block #####" to the last "##### END
block #####", and everything in between... I believe that the ".*",
combined with the "s" modifier, in the regex is causing this match to be
made.

What I'm not sure how to do is tell Perl to search from START to the next
END and then start the search pattern over again with the next START-END
match.

How might I go about achieving this?

Thank you,

~ Tom

----- Example Text -----

##### START block #####
#

A block of text.

#
##### END block #####

ok: some text

##### START block #####
#

A block of text.

#
##### END block #####

ok: some text


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Johnson  
View profile  
 More options Nov 1 2012, 6:00 am
Newsgroups: perl.beginners
From: p...@pjcj.net (Paul Johnson)
Date: Thu, 1 Nov 2012 10:42:42 +0100
Local: Thurs, Nov 1 2012 5:42 am
Subject: Re: Regex one-liner to find several multi-line blocks of text in a single file

perl -ne 'print if /##### START block #####/ .. /##### END block #####/' file.txt

--
Paul Johnson - p...@pjcj.net
http://www.pjcj.net


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jim Gibson  
View profile  
 More options Nov 1 2012, 11:15 am
Newsgroups: perl.beginners
From: jimsgib...@gmail.com (Jim Gibson)
Date: Thu, 1 Nov 2012 08:08:43 -0700
Local: Thurs, Nov 1 2012 11:08 am
Subject: Re: Regex one-liner to find several multi-line blocks of text in a single file

On Nov 1, 2012, at 12:44 AM, Thomas Smith wrote:

The '*' is what's called a "greedy" quantifier. That means it will match as many characters in the string as possible. What the regular expression engine does when it encounters the pattern '.*' is to immediately match it with as many characters as possible. Since your regular expression includes the 's' modifier, this will include newlines as well. When the RE engine sees that there are characters in the pattern after the '.*', it will start removing characters from the end of the substring matched by the '.*' until the subsequent pattern characters are also matched. This will continue until there are no characters matched by the '.*'.

The result of all this is that for your pattern, the last '##### END block #####' substring is the one that will be matched, and the '.*' pattern will match everything between the first '##### START block #####' and the last '##### END block #####'.

The way to fix this is to make the '*' quantifier "non-greedy" by putting a '?' quantifier after it. With that pattern, the RE engine will match as few characters as possible, and the first START block will pair up with the first subsequent END block. A 'g' modifier will tell the RE engine to start looking after each match for the next match in the string.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »