Replace every Nth occurrence with grep?

flybynight

unread,

Feb 4, 2014, 6:34:14 PM2/4/14

to textwr...@googlegroups.com

Sorry if this shows up twice. I thought I posted this earlier, but then I didn't see it…

I have a mailing list that came to me as one big column with line breaks between each "field" as well as between each "record." The good news is that it is pretty consistent - 4 lines per record. So I need something to replace the 1st, 2nd, and 3rd \r (line break) with a \t (tab), but leave the 4th \r alone, then repeat the pattern over and over.

I would change it from something like this:

John Doe

Company A

123 Main Street

Anytown, NY 10010

Bob Smith

Company B

456 Scenic Drive

Cityville, CA 90210

etc, etc...

to something like this:

John Doe Company A 123 Main Street Anytown, NY 10010

Bob Smith Company B 456 Scenic Drive Cityville, CA 90210

etc, etc...

For entering it into the forum, I put in 5 spaces instead of a tab.

I would think this would be a grep thing, but I'm no expert and I haven't been able to find anything quite like this in the online tutorials and examples.

Any help would be greatly appreciated!

Thanks

-Shawn

Kendall Conrad

unread,

Feb 4, 2014, 11:52:19 PM2/4/14

to textwr...@googlegroups.com

Here's one way with grep

Find: (?:(.*)\r)(?:(.*)\r)(?:(.*)\r)(.*?\r)
Replace w/: \1\t\2\t\3\t\4

The ?: portions make it so they're not numeric matches, so the \1 in the replacement only deals with the pieces we really care about.

-Kendall

Christopher Stone

unread,

Feb 5, 2014, 2:28:09 AM2/5/14

to TextWrangler-Talk

On Feb 04, 2014, at 22:52, Kendall Conrad <ange...@gmail.com> wrote:

Find: (?:(.*)\r)(?:(.*)\r)(?:(.*)\r)(.*?\r)
Replace w/: \1\t\2\t\3\t\4

The ?: portions make it so they're not numeric matches, so the \1 in the replacement only deals with the pieces we really care about.

______________________________________________________________________

Hey Kendall,

"Not numeric matches" fails to clearly convey what you mean and sounds like you're trying to avoid matching digits. Better to use the terminology found in the manual:

"Sometimes, however, parentheses are needed only for clustering, not capturing. TextWrangler now supports non-capturing parentheses, using the syntax:

(?:PATTERN)

That is, if an open parenthesis is followed by “?:”, the subpattern matched by that pair of parentheses is not counted when computing the backreferences. For example, if the text “red king” is matched against the pattern:

(?:(red|white) (king|queen))"

--

Best Regards,

Chris

Christopher Stone

unread,

Feb 5, 2014, 4:24:18 AM2/5/14

to TextWrangler-Talk

On Feb 04, 2014, at 17:34, flybynight <flyby...@mac.com> wrote:

I have a mailing list that came to me as one big column with line breaks between each "field" as well as between each "record." The good news is that it is pretty consistent - 4 lines per record.

______________________________________________________________________

Hey Shawn,

Just in case this is something you need to do on a regular basis (and for fun), I've written a couple of text-filters and an Applescript to do the job.

Keep the principles in mind, as they're easily adapted to other tasks.

--

Best Regards,

Chris

-------------------------------------------------------------------------------------------

Very Basic Perl Text-Filter:

#! /usr/bin/env perl

use strict; use warnings;

my $cntr;

$cntr = 1;

while (<>) {

chomp;

print;

if ($cntr < 4) {

print "\t";

$cntr++;

} else {

print "\n";

$cntr = 1;

}

Basic Perl Text-Filter Using an Array:

#! /usr/bin/env perl

use v5.010; use strict; use warnings;

my (@reco, $cntr);

$cntr = 1;

$, = "\t";

while (<>) {

push @reco, $_;

if ($cntr == 4) {

chomp @reco;

say @reco;

@reco = ();

$cntr = 1;

} else {

$cntr++;

}

A Little Tersification:

#! /usr/bin/env perl

use strict; use warnings;

while (<>) {

chomp; print;

if ($. % 4 != 0) {

print "\t";

} else {

print "\n";

}

Applescripting the Regular Expression (with basic error-checking):

-------------------------------------------------------------------------------------------

try

tell application "BBEdit"

replace "(^.+)\\n(^.+)\\n(^.+)\\n(^.+)" using "\\1\\t\\2\\t\\3\\t\\4" searching in ¬

text of front text window options {search mode:grep, case sensitive:true}

end tell

on error e number n

set e to e & return & return & "Num: " & n

tell me to set dDlg to display dialog e with title "ERROR!" buttons {"Cancel", "Copy", "OK"} default button "OK"

if button returned of dDlg = "Copy" then set the clipboard to e

end try

-------------------------------------------------------------------------------------------

flybynight

unread,

Feb 5, 2014, 11:51:12 AM2/5/14

to textwr...@googlegroups.com

Kendall,

Thank you so much! This worked perfectly!

Chris - thank you for your time as well. Glad you consider scripting fun. It's awesome when you can get it to do things like this. Such an amazing time saver!

Laters,

-Shawn

Kendall Conrad

unread,

Feb 5, 2014, 4:10:35 PM2/5/14

to textwr...@googlegroups.com, listm...@suddenlink.net

Thanks Chris, I was experiencing a total brain fart when I wrote that part. I couldn't remember the "non-capturing" terminology and just left the clumsy wording.

Glad the regex at least worked for you Shawn, so not everything I wrote came out bad.

-Kendall

Reply all

Reply to author

Forward