Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Subpatterns in Grep?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
jmichel  
View profile  
 More options Oct 5 2012, 11:18 am
From: jmichel <jmi.mig...@gmail.com>
Date: Fri, 5 Oct 2012 08:18:25 -0700 (PDT)
Local: Fri, Oct 5 2012 11:18 am
Subject: Subpatterns in Grep?

I have a file consisting of groups of lines (unknown number of lines in
each group).
Each line begins by a 6 digit number, followed by an unknown sequence of
words and numbers.
Consecutive lines starting with the same number form a group.
My problem is to combine lines from each group into a single line, keeping
only the first occurrence of the distinctive number.
I have been able to "find" groups using the pattern
(\d{6})(.+)(?:\r\1(.+))+
However, this does not appear to store the expressions matching the inner
parentheses into separate variables.
Is there a way to achieve the desired replacement using grep?

Thanks in advance


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Woolsey  
View profile  
 More options Oct 5 2012, 2:01 pm
From: Patrick Woolsey <pwool...@barebones.com>
Date: Fri, 5 Oct 2012 14:01:12 -0400
Local: Fri, Oct 5 2012 2:01 pm
Subject: Re: Subpatterns in Grep?
At 08:18 -0700 10/05/2012, jmichel wrote:

>I have a file consisting of groups of lines (unknown number of lines in
>each group).
>Each line begins by a 6 digit number, followed by an unknown sequence of
>words and numbers.
>Consecutive lines starting with the same number form a group.
>My problem is to combine lines from each group into a single line, keeping
>only the first occurrence of the distinctive number.
>I have been able to "find" groups using the pattern
>(\d{6})(.+)(?:\r\1(.+))+

>However, this does not appear to store the expressions matching the inner
>parentheses into separate variables.

>Is there a way to achieve the desired replacement using grep?

Provided I understand the task correctly, though your pattern should match
all such groups of lines, I don't see any way to restructure the matched
text in a single step.

(A relatively easy brute force solution would be to concatenate all
matching line pairs, then rinse & repeat. :)

As to your question about storage:

Though the contents of that inner subpattern (.+) are being captured N
times (where N is the number of lines within the match), only the last
instance matched will be stored and available by reference to that
subpattern.

   [ As an aside for anyone else who may be wondering, this part of
     the pattern (?: ) consists of `non-capturing parentheses` which
     do not themselves store matched text. ]

For example, if you apply the following search & replace patterns:

Find:      (?:(\d{6})\r)+
Replace:   \1

to this text:

111222
333444
555666

the result will be:

555666

Regards,

 Patrick Woolsey
==
Bare Bones Software, Inc.             <http://www.barebones.com/>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Delacour  
View profile  
 More options Oct 5 2012, 7:25 pm
From: John Delacour <johndelac...@gmail.com>
Date: Sat, 6 Oct 2012 00:25:47 +0100
Local: Fri, Oct 5 2012 7:25 pm
Subject: Re: Subpatterns in Grep?

On 5 Oct 2012, at 16:18, jmichel <jmi.mig...@gmail.com> wrote:

> I have a file consisting of groups of lines (unknown number of lines in each group).
> Each line begins by a 6 digit number, followed by an unknown sequence of words and numbers.
> Consecutive lines starting with the same number form a group.
> My problem is to combine lines from each group into a single line, keeping only the first occurrence of the distinctive number.
> I have been able to "find" groups using the pattern
> (\d{6})(.+)(?:\r\1(.+))+
> However, this does not appear to store the expressions matching the inner parentheses into separate variables.
> Is there a way to achieve the desired replacement using grep?

Using regular expressions yes, but you need a routine.  If you put a file containing
this Perl Script in ~/LibraryApplication Support/BBEdit/Text Filters, it will do what
you want.  Open the Text Filters palette from the Window menu and you will see the
filter.  Double-click it or click on Run or, if its a frequent task, assign a shortcut to the
script.

Save this as ???.pl

#!/usr/bin/perl
my %hash;
my $six_digits = "[0-9]{6}";
my $remaining_text = ".*";
my $delimiter = ""; # or ", " for example
while (<>) {
        if  ( /^($six_digits)($remaining_text)/ ) {
                $hash{$1} .= $2 # append the text after the 6 digits
        }

}

for (sort {$a<=>$b} keys %hash) {
        print "$_$delimiter$hash{$_}\n"

}

#JD

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jmichel  
View profile  
 More options Oct 6 2012, 2:50 am
From: jmichel <jmi.mig...@gmail.com>
Date: Fri, 5 Oct 2012 23:50:22 -0700 (PDT)
Local: Sat, Oct 6 2012 2:50 am
Subject: Re: Subpatterns in Grep?

Thanks for these explanations. They confirm what I suspected.
Assuming that the number of lines in one group can never exceed, say, 15 or
so, could one circumvent the difficulty by explicitly repeating the search
pattern a sufficient number of times?
Then the problem would be to ensure a match also in the case when the
number of lines is smaller. Any idea on how that could be achieved? Could
conditional matching help (I am not familiar with those "advanced
features")?

Le vendredi 5 octobre 2012 20:01:21 UTC+2, Patrick Woolsey a écrit :


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jmichel  
View profile  
 More options Oct 6 2012, 2:57 am
From: jmichel <jmi.mig...@gmail.com>
Date: Fri, 5 Oct 2012 23:57:39 -0700 (PDT)
Local: Sat, Oct 6 2012 2:57 am
Subject: Re: Subpatterns in Grep?

This sounds amazingly powerful and flexible. Thanks a lot. I will try it
asap.
The only problem is that I will need to learn Perl if I want to be able to
write such scripts…

Le samedi 6 octobre 2012 01:25:58 UTC+2, eremita a écrit :


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Woolsey  
View profile  
 More options Oct 6 2012, 11:04 am
From: Patrick Woolsey <pwool...@barebones.com>
Date: Sat, 6 Oct 2012 11:04:32 -0400
Local: Sat, Oct 6 2012 11:04 am
Subject: Re: Subpatterns in Grep?
At 23:50 -0700 10/05/2012, jmichel wrote:

>Thanks for these explanations. They confirm what I suspected.
>Assuming that the number of lines in one group can never exceed, say, 15
>or so, could one circumvent the difficulty by explicitly repeating the
>search pattern a sufficient number of times?

Yes, and please see below.

>Then the problem would be to ensure a match also in the case when the
>number of lines is smaller. Any idea on how that could be achieved? Could
>conditional matching help (I am not familiar with those "advanced
>features")?

To do this, just modify your existing pattern to find successive pairs of
matching lines and combine their contents:

Find:       (\d{6})(.+)(?:\r\1(.+))

Replace:    \1\2\3

and then repeatedly apply Replace All until all line pairs which start with
the same numeric prefix have been consolidated to single lines.

(E.g. for groups of 16 lines or fewer, this will take at most 4 passes of
Replace All; for groups of 64 lines or fewer, 6 passes; etc.)

PS: John Delacour's text filter is a much nicer general solution; the only
advantage of the above is it doesn't require knowledge of Perl.

Regards,

 Patrick Woolsey
==
Bare Bones Software, Inc.             <http://www.barebones.com/>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jmichel  
View profile  
 More options Oct 8 2012, 5:19 am
From: jmichel <jmi.mig...@gmail.com>
Date: Mon, 8 Oct 2012 02:19:38 -0700 (PDT)
Local: Mon, Oct 8 2012 5:19 am
Subject: Re: Subpatterns in Grep?

This sounds fine for my type of use.
Thanks again.

Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit :

Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit :

Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit :


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »