Replace every Nth occurrence with grep?

1,014 views
Skip to first unread message

flybynight

unread,
Feb 4, 2014, 6:34:14 PM2/4/14
to textwr...@googlegroups.com
Sorry if this shows up twice. I thought I posted this earlier, but then I didn't see it…

I have a mailing list that came to me as one big column with line breaks between each "field" as well as between each "record." The good news is that it is pretty consistent - 4 lines per record. So I need something to replace the 1st, 2nd, and 3rd \r (line break) with a \t (tab), but leave the 4th \r alone, then repeat the pattern over and over. 
I would change it from something like this:

John Doe
Company A
123 Main Street
Anytown, NY 10010
Bob Smith
Company B
456 Scenic Drive
Cityville, CA 90210
etc, etc...

to something like this:

John Doe     Company A     123 Main Street     Anytown, NY 10010
Bob Smith     Company B     456 Scenic Drive     Cityville, CA 90210
etc, etc...

For entering it into the forum, I put in 5 spaces instead of a tab.

I would think this would be a grep thing, but I'm no expert and I haven't been able to find anything quite like this in the online tutorials and examples.

Any help would be greatly appreciated!
Thanks
-Shawn

Kendall Conrad

unread,
Feb 4, 2014, 11:52:19 PM2/4/14
to textwr...@googlegroups.com
Here's one way with grep

Find: (?:(.*)\r)(?:(.*)\r)(?:(.*)\r)(.*?\r)
Replace w/: \1\t\2\t\3\t\4

The ?: portions make it so they're not numeric matches, so the \1 in the replacement only deals with the pieces we really care about.

-Kendall

Christopher Stone

unread,
Feb 5, 2014, 2:28:09 AM2/5/14
to TextWrangler-Talk
On Feb 04, 2014, at 22:52, Kendall Conrad <ange...@gmail.com> wrote:
Find: (?:(.*)\r)(?:(.*)\r)(?:(.*)\r)(.*?\r)
Replace w/: \1\t\2\t\3\t\4

The ?: portions make it so they're not numeric matches, so the \1 in the replacement only deals with the pieces we really care about.
______________________________________________________________________

Hey Kendall,

"Not numeric matches" fails to clearly convey what you mean and sounds like you're trying to avoid matching digits.  Better to use the terminology found in the manual:

"Sometimes, however, parentheses are needed only for clustering, not capturing. TextWrangler now supports non-capturing parentheses, using the syntax:

     (?:PATTERN)

That is, if an open parenthesis is followed by “?:”, the subpattern matched by that pair of parentheses is not counted when computing the backreferences. For example, if the text “red king” is matched against the pattern:

     (?:(red|white) (king|queen))"

--
Best Regards,
Chris

Christopher Stone

unread,
Feb 5, 2014, 4:24:18 AM2/5/14
to TextWrangler-Talk
On Feb 04, 2014, at 17:34, flybynight <flyby...@mac.com> wrote:
I have a mailing list that came to me as one big column with line breaks between each "field" as well as between each "record." The good news is that it is pretty consistent - 4 lines per record.
______________________________________________________________________

Hey Shawn,

Just in case this is something you need to do on a regular basis (and for fun), I've written a couple of text-filters and an Applescript to do the job.

Keep the principles in mind, as they're easily adapted to other tasks.

--
Best Regards,
Chris

-------------------------------------------------------------------------------------------

Very Basic Perl Text-Filter:

#! /usr/bin/env perl 
use strict; use warnings;
my $cntr;
$cntr = 1;
while (<>) {
chomp;
print;
if ($cntr < 4) {
print "\t";
$cntr++;
} else {
print "\n";
$cntr = 1;
}
}

Basic Perl Text-Filter Using an Array:

#! /usr/bin/env perl 
use v5.010; use strict; use warnings;
my (@reco, $cntr);
$cntr = 1;
$, = "\t";
while (<>) {
push @reco, $_;
if ($cntr == 4) {
chomp @reco;
say @reco;
@reco = ();
$cntr = 1;
} else {
$cntr++;
}
}


A Little Tersification:

#! /usr/bin/env perl 
use strict; use warnings;
while (<>) {
chomp; print;
if ($. % 4 != 0) {
print "\t";
} else {
print "\n";
}
}

Applescripting the Regular Expression (with basic error-checking):

-------------------------------------------------------------------------------------------

try

  

  tell application "BBEdit"
    replace "(^.+)\\n(^.+)\\n(^.+)\\n(^.+)" using "\\1\\t\\2\\t\\3\\t\\4" searching in ¬
      text of front text window options {search mode:grep, case sensitive:true}
  end tell

  

on error e number n
  set e to e & return & return & "Num: " & n
  tell me to set dDlg to display dialog e with title "ERROR!" buttons {"Cancel", "Copy", "OK"} default button "OK"
  if button returned of dDlg = "Copy" then set the clipboard to e
end try

-------------------------------------------------------------------------------------------

flybynight

unread,
Feb 5, 2014, 11:51:12 AM2/5/14
to textwr...@googlegroups.com
Kendall,
Thank you so much! This worked perfectly!

Chris - thank you for your time as well. Glad you consider scripting fun. It's awesome when you can get it to do things like this. Such an amazing time saver!

Laters,
-Shawn

Kendall Conrad

unread,
Feb 5, 2014, 4:10:35 PM2/5/14
to textwr...@googlegroups.com, listm...@suddenlink.net
Thanks Chris, I was experiencing a total brain fart when I wrote that part. I couldn't remember the "non-capturing" terminology and just left the clumsy wording.

Glad the regex at least worked for you Shawn, so not everything I wrote came out bad.

-Kendall
Reply all
Reply to author
Forward
0 new messages