CONTEST: Range Searching

Tom Christiansen

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

We're lately been stricken with questions about what comes down to
variations on the same essential theme: how to search a stream for matches
and then produce as output some number of lines before or after the match
as well as the matched line. This has to work efficiently for an input
stream of unlimited length.

Perl's two range operators lend themselves to simple solutions for some
of these kinds of things, like

# print lines between PAT1 and PAT2
while (<>) {
print if /PAT1/ .. /PAT2/;
}

But that's not good for the general case of catching things before or
counting things afterwards

So here's the challenge: devise idiomatic solutions to the
following problems.

1) Write a "patfore" program that prints out up to N lines
before the match as well as the match itself. Here's
the usage message:

patfore [-B N] pattern [files ...]

2) Write a "pataft" program that prints out up to N lines
after the match as well as the match itself. Here's
the usage message:

pataft [-A N] pattern [files ...]

3) Write a "patba" program that prints out up to
X lines before the match and Y lines after the match.

patba [-A X] [-B T] pattern [files ...]

or both N before and after:

patba [-C N] pattern [files ...]

For Extra Credit:
Provide alternate solutions that also coalesce with overlapping
ranges. For example, if you ask for 2 lines before and after, and
lines 2, 4, and 5 all contain matches, the output should comprise
lines [1-7] rather three separate output blocks showing lines.
[1-4], [2-6], and [3-7].

Solutions will be judged upon these overall criteria:

Objective:

* correctness: does it actually do the right thing? if not,
nothing else matters. :-)
* space efficiency: don't use more space than minimally needed
* time efficiency: is your solution fast?
* test coverage: do you include test data to check all border cases?

Subjective:

* conciseness: keep it short. no frills. just a few lines.
* clarity: is this understandable to a native perl speaker?
* idiom: does this look like natural perl? does it use cool
perl features absent in other languages?
* creativity: is your solution cleverly different from those
of others?

To see what I mean by "idiomatic", here's an example of something I just
provided that is of a similar nature to these problems:

# Print all line pairs where the first one ends
# in YIN and the next starts with YANG.
while (<DATA>) {
$n = /YIN$/ ... $n or next;
$pair .= $_;
1 - $n || next;
/^YANG/ && print $pair;
$pair = '';
}

If you wanted to be more obvious, you could certainly write that as:

while (<DATA>) {
$n = /YIN$/ ... $n or next;
$pair .= $_;
next if $n == 1;
print $pair if /^YANG/;
$pair = '';
}

The point of that demo is that these problems shouldn't take zillions of
lines to code up, and also that I consider range operators plenty obvious
to native perl speakers and potentially useful to this problem domain.
You don't *have* to use them, and some solutions don't so clearly call
for them as did this one, but do please feel free to employ them if you're
so inclined. (I expect a lot of solutions using a circular @buff array.)

The prize is that winners will have their solutions published in the FAQ,
and if it's cool enough, perhaps even featured in the Perl Journal.

Please post all solutions and discussion to Usenet.

Good luck, and thanks for helping save the world!

--tom
--
"Without knowing what I am and why I am here, life is impossible."
- Leo Tolstoy

Abigail

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

Tom Christiansen (tch...@mox.perl.com) wrote on MMCCVII September
MCMXCIII in <URL:news:37e1...@cs.colorado.edu>:

<>
<> So here's the challenge: devise idiomatic solutions to the
<> following problems.
<>
<> 1) Write a "patfore" program that prints out up to N lines
<> before the match as well as the match itself. Here's
<> the usage message:
<>
<> patfore [-B N] pattern [files ...]

If the match appears on line N + 3, does that mean one has to print
lines 1 .. N, and line N + 3, or lines 3 .. N + 3?

What if there's no match at all?

Abigail
--
sub camel (^#87=i@J&&&#]u'^^s]#'#={123{#}7890t[0.9]9@+*`"'***}A&&&}n2o}00}t324i;
h[{e **###{r{+P={**{e^^^#'#i@{r'^=^{l+{#}H***i[0.9]&@a5`"':&^;&^,*&^$43##@@####;
c}^^^&&&k}&&&}#=e*****[]}'r####'`=437*{#};::'1[0.9]2@43`"'*#==[[.{{],,,1278@#@);
print+((($llama=prototype'camel')=~y|+{#}$=^*&[0-9]i@:;`"',.| |d)&&$llama."\n");

-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----

Michael de Beer

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

Any restrictions on system calls?
If not, this could be a very short script ;)

-Michael

Don Blaheta

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

Quoth Tom Christiansen:

> 3) Write a "patba" program that prints out up to
> X lines before the match and Y lines after the match.
>
> patba [-A X] [-B T] pattern [files ...]

This isn't consistent with itself or the previous two items; I assumed
-A to indicate "after" and -B "before" as with the previous two.

Here are the three while loops comprising the brunt of my programs:

# pataft
while (<>) {
$aftleft = $aft + 1 if /$pat/o;
print and $aftleft-- if $aftleft;
}

#patfore
while (<>) {
shift @forelines if @forelines > $fore;
push @forelines, $_;
print @forelines and @forelines = () if /$pat/o;
}

#patba
while (<>) {
shift @forelines if @forelines > $fore;
push @forelines, $_;
print and $aftleft-- and @forelines = () if $aftleft;
print @forelines and $aftleft = $aft if /$pat/o;
}

Each one starts with the same header:
#!/usr/local/bin/perl -w
use Getopt::Std;

getopt('ABC', \%opt);
$fore = $opt{B} || $opt{C} || 3;
$aft = $opt{A} || $opt{C} || 3;
$pat = shift;

And here's a test file to pipe through them (using pattern "foo"):
a
b
foo
foo
c
foo
d
e
f
g
foo
h
i
j
k
l
m
n
foo
o

That tests the cases where the pattern lines are adjacent, where the
second pattern lies in the "aft" region of the first, where it lies
outside the "aft" region but its "fore" region overlaps it, and where
they do not interfere at all.

> * correctness: does it actually do the right thing? if not,
> nothing else matters. :-)
> * space efficiency: don't use more space than minimally needed

> * test coverage: do you include test data to check all border cases?

Well, I meet these three, at least. :)

> * time efficiency: is your solution fast?

Hard to say until I have something to test it against. I think I at
least avoided anything excruciatingly _slow_.

--
-=-Don Blaheta-=-=-d...@cs.brown.edu-=-=-<http://www.cs.brown.edu/~dpb/>-=-
To iterate is human, to recurse divine.

Lewis Perin

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

Tom Christiansen <tch...@mox.perl.com> writes:

> [...]

> # Print all line pairs where the first one ends
> # in YIN and the next starts with YANG.
> while (<DATA>) {
> $n = /YIN$/ ... $n or next;
> $pair .= $_;
> 1 - $n || next;
> /^YANG/ && print $pair;
> $pair = '';
> }

Perhaps it was deliberate, but the above code doesn't snag the second
pair if it overlaps with the first, as in

YIN
YANGYIN
YANG

Cheers, Lew

Kragen Sitaker

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

In article <37e1...@cs.colorado.edu>,

Tom Christiansen <tch...@mox.perl.com> wrote:
>So here's the challenge: devise idiomatic solutions to the
>following problems.
>
> 1) Write a "patfore" program that prints out up to N lines
> before the match as well as the match itself. Here's
> the usage message:
>
> patfore [-B N] pattern [files ...]

#!/usr/bin/perl -w
use strict;
# Find a pattern and print lines before the pattern. By Kragen Sitaker,
# 1999-09-16.

# Number of lines to output.
my $n = 4;
my $pattern;

arg: while (@ARGV) {
$ARGV[0] eq '-B' and do {
shift;
$n = 1 + shift;
next arg;
};
do {
$pattern = shift;
last arg;
};
}

# This is probably a lousy way to determine whether we ran out of args.
die "Usage: patfore [-B N] pattern [files ...]\n" if not defined $pattern;

my @lines = ('') x $n;
while (<>) {
push @lines, $_;
shift @lines;
if (/$pattern/o) {
print @lines;
# If the next line contains the pattern again, we don't
# want to output this line again.
@lines = ('') x $n;
}
}

> 2) Write a "pataft" program that prints out up to N lines
> after the match as well as the match itself. Here's
> the usage message:
>

> pataft [-A N] pattern [files ...]

#!/usr/bin/perl -w
use strict;
# Print out lines matching a pattern and lines following them.
# Kragen Sitaker, 1999-09-16.

# Number of lines to print.
my $after = 4;
my $pattern;

arg: while (@ARGV) {
$ARGV[0] eq '-A' and do {
shift;
$after = 1 + shift;
next arg;
};
do {
$pattern = shift;
last arg;
};
}

# This is probably still a lousy way to determine whether we ran out of args.
die "Usage: pataft [-A N] pattern [files ...]\n" if not defined $pattern;

my $linestoprint = 0;
while (<>) {
$linestoprint = $after if /$pattern/o;
print if $linestoprint;
--$linestoprint if $linestoprint;
}

> 3) Write a "patba" program that prints out up to
> X lines before the match and Y lines after the match.
>
> patba [-A X] [-B T] pattern [files ...]
>

> or both N before and after:
>
> patba [-C N] pattern [files ...]

This program is *much* worse than the above programs. I am much less
confident in its correctness.

#!/usr/bin/perl -w
use strict;
# Print out lines matching a pattern, a certain number of lines preceding
# the match lines, and a certain number of lines following them.
# Kragen Sitaker, 1999-09-16.

# Number of lines to print starting at the match and continuing on.
my $after = 4;
# Number of lines to print up to and including the match.
my $before = 4;

my $pattern;
# write sensible output? Default is no.
# sensible output inserts a line saying ... in every place lines
# from the input file was left out, and prepends each output line copied
# from the input file with a space. It's not the default because it wasn't
# in tchrist's spec.
# It doesn't affect the choice of which lines to output, just how to output
# them.
my $sensible = undef;

arg: while (@ARGV) {
$ARGV[0] eq '-A' and do {
shift;
$after = 1 + shift;
next arg;
};
$ARGV[0] eq '-B' and do {
shift;
$before = 1 + shift;
next arg;
};
$ARGV[0] eq '-C' and do {
shift;
$after = $before = 1 + shift;
next arg;
};
$ARGV[0] eq '-s' and do {
shift;
$sensible = 'yes!';
next arg;
};
do {
$pattern = shift;
last arg;
};
}

# This is probably even now a lousy way to determine whether we ran out
# of args.
die "Usage: patba [-s] [-A X] [-B T] [-C N] pattern [files ...]\n"
if not defined $pattern;

# Every line goes exactly one place: either printed out or pushed on the
# preceding lines list.

my $linestoprint = 0;
# The first line of @preceding_lines is never printed. It is there only
# to determine if what we're printing out is contiguous with the last
# output text.
my @preceding_lines = ('') x ($before + 1);
while (<>) {
if ($linestoprint) {
print $sensible ? (" ", $_) : $_;
--$linestoprint;
} else {
push @preceding_lines, ($sensible ? " $_" : $_);
shift @preceding_lines;
}
if (/$pattern/o) {
# $preceding_lines[0] will be '' if @preceding_lines is
# not full.
print "...\n" if $sensible and $preceding_lines[0];
print @preceding_lines[1..$#preceding_lines];
# If the next line contains the pattern again, we don't
# want to output this line again.
@preceding_lines = ('') x ($before + 1);
$linestoprint = $after - 1;
}
}

# Did we read any lines we didn't print?
print "...\n" if $sensible and $preceding_lines[$#preceding_lines];

>For Extra Credit:
> Provide alternate solutions that also coalesce with overlapping
> ranges.

Oops, sorry -- didn't see that. I wouldn't want to try to use these
programs without that.

> * correctness: does it actually do the right thing? if not,
> nothing else matters. :-)

Well, I think that what I was hoping to do is the right thing. I'm
certain my patfore and pataft do what I was hoping to do. My patba
appears to do the right thing, but I have to go home now, so I am
posting this now :)

> * space efficiency: don't use more space than minimally needed

patba and pataft are probably perfect here. patba uses more space than
minimally needed, but not a huge amount -- it keeps around an extra
line of lookbehind when, with just a few more lines of code, it could
just keep a boolean. And it prepends a space to every line in
'sensible' mode.

> * time efficiency: is your solution fast?

Dunno. :)

> * test coverage: do you include test data to check all border cases?

Here's patba.test, which includes only input, not expected output. And
I don't think it's really complete, although it's not obvious to me
what I'm missing. (Well, I know I'm missing the non-sensible cases,
but I'm confident those are at least as correct as the sensible ones.)

#!/bin/sh
../patba
read foo
../patba -C 3
read foo
../patba -s -C 0 pattern patba | less
../patba -s -C 1 pattern patba | less
../patba -s -C 2 pattern patba | less

patfore and pataft I tested by hand.

> * conciseness: keep it short. no frills. just a few lines.

True of patfore and pataft. Probably not true of patba. You'll
probably think $sensible is a frill, but I think it's a necessity to
understand the output.

> * clarity: is this understandable to a native perl speaker?

I've never met one. And don't you mean Perl, not perl?

> * idiom: does this look like natural perl? does it use cool
> perl features absent in other languages?

I don't know. :)

> * creativity: is your solution cleverly different from those
> of others?

Probably not.
--
<kra...@pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/>
Thu Sep 16 1999
53 days until the Internet stock bubble bursts on Monday, 1999-11-08.
<URL:http://www.pobox.com/~kragen/bubble.html>

Tom Christiansen

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

[courtesy cc of this posting mailed to cited author]

In comp.lang.perl.misc, Michael de Beer <made...@igc.apc.org> writes:
:Any restrictions on system calls?

Ah, I don't explain you doing an

exec 'gnugrep', @ARGV;

It's the algorithm(s) that I want good but concise examples of, because
people will use this in other situations than simply greppish examples of.
I was only showing the -A, -B, and -C flags because I thought they'd
trigger associations with the samely-named gnugrep options.

--tom
--
"Even egotists are allowed to have opinions." --Larry Wall

Tom Christiansen

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

[courtesy cc of this posting mailed to cited author]

In comp.lang.perl.misc, abi...@delanet.com writes:
:If the match appears on line N + 3, does that mean one has to print

:lines 1 .. N, and line N + 3, or lines 3 .. N + 3?
:
:What if there's no match at all?

Sorry that I missed your posting. My killfile seems to have zapped my
own stuff this time. Odd.

What I think I mean is that the -B3 should act the way gnugrep's does:
when you find a match, pretty much you print the matching line and the
three lines previous to it. I'm just trying to get good examples
of the general algorithms involved here to add to the FAQ so that
people can use them in various programs.

--tom
--
My opinions may have changed, but not the fact that I am right.
-- Ashleigh Brilliant

Tom Christiansen

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

[courtesy cc of this posting mailed to cited author]

In comp.lang.perl.misc,
d...@cs.brown.edu (Don Blaheta) writes:
:Quoth Tom Christiansen:
:> 3) Write a "patba" program that prints out up to

:> X lines before the match and Y lines after the match.
:>
:> patba [-A X] [-B T] pattern [files ...]

:
:This isn't consistent with itself or the previous two items; I assumed

:-A to indicate "after" and -B "before" as with the previous two.

Yes, that's right. I typed it wrong. I'm just emulating gnu grep.
Think of it as

patba [ [ [-A X] [-B Y] ] | [-C N] ] pattern [files ...]

but that's a bit harder to explain.

--tom
--
If you still want to do this in light of the above disadvantages, the method
is left as an exercise to the reader. It'll void your Apache warrenty,
though, and you'll lose all accumulated Unix guru points.
-- FAQ for Apache

Tom Christiansen

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

[courtesy cc of this posting mailed to cited author]

In comp.lang.perl.misc, Lewis Perin <pe...@panix.com> writes:
:Perhaps it was deliberate, but the above code doesn't snag the second

:pair if it overlaps with the first, as in
:
:YIN
:YANGYIN
:YANG

No, it wasn't really deliberate. My test cases weren't good enough
to catch that.

--tom
--
"Politics is a pendulum whose swings between anarchy and tyranny are fueled by
perennially rejuvenated illusions."
- Albert Einstein

Jamie McCarthy

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

Tom Christiansen wrote:

> For Extra Credit:
> Provide alternate solutions that also coalesce with overlapping

> ranges. For example, if you ask for 2 lines before and after, and
> lines 2, 4, and 5 all contain matches, the output should comprise
> lines [1-7] rather three separate output blocks showing lines.
> [1-4], [2-6], and [3-7].

The extra credit problem is IMHO the only fun one so that's what
I'll try. Also it's the most sensible, the first three problems
produce unexpected (to a human) output.

> Solutions will be judged upon these overall criteria:
>
> Objective:
>

> * correctness: does it actually do the right thing? if not,
> nothing else matters. :-)

> * space efficiency: don't use more space than minimally needed

> * time efficiency: is your solution fast?

> * test coverage: do you include test data to check all border cases?

I notice readability is not on the list, heh. I believe this
solution is correct but I haven't put together truly rigorous
test cases so I can't be sure. It's pretty space-efficient,
printing and forgetting unneeded data, and I suspect time-
efficiency is OK (though being I/O-bound is likely of course).

> Subjective:

>
> * conciseness: keep it short. no frills. just a few lines.

> * clarity: is this understandable to a native perl speaker?

> * idiom: does this look like natural perl? does it use cool
> perl features absent in other languages?

> * creativity: is your solution cleverly different from those
> of others?

The core code is just a few lines, please pardon the command
line arguments and whatnot. Opinions may differ as to how
"native" my coding style is: if I look too much like a
reformed Pascal programmer I can scuzz it up a bit! :-)

#!/usr/bin/perl -w

# Parse the command-line switches.

$after = $before = 0; # Defaults.
while (@ARGV and $ARGV[0] =~ /^-([ABC])$/) {
SWITCH: {
$1 eq 'A' and $after = $ARGV[1];
$1 eq 'B' and $before = $ARGV[1];
$1 eq 'C' and $before = $after = $ARGV[1];
}
shift, shift;
}

# Parse the remaining command-line argument(s).

die "need a regex" unless @ARGV;
$regex = shift @ARGV;

# Set constants and globals.

$prelude = ">>>\n"; # Or the empty string;
$postlude = "<<<\n"; # whatever you prefer.
@buff = ( );
$match = $skimmed = 0;

# The main loop. The key is to track the last-matched
# line number in $match and of course the current line
# number in $. -- the difference between them tells us
# whether we're done matching and thus need to dump the
# context of the match(es) to STDOUT.

while (defined($line = <>)) {
push @buff, $line;
$match = $. if $line =~ /$regex/o;
if ($match) {
if ($. - $match > $before+$after) {
&dump;
} elsif ($. - $match > $before) {
# One of the design goals is to run in the smallest
# space possible. That means immediately printing
# any lines we know for a fact can be printed, and
# shifting them off the buffer to free up their RAM.
# The beauty is that we always know how near to the
# _end_ of the buffer to dump, so we're free to skim
# values off its _front_ if appropriate.
print $prelude if !$skimmed++;
print shift @buff;
}
}
# If we're not matching and we're saving too much context,
# we can shift off one line of that context.
shift @buff if !$match and $#buff >= $before;
}

# After we run off the end of the last file, if we are still
# matching, it can be a little tricky to properly print context
# around the last lines (consider the case where the last match
# was less than $before+$after but more than $after lines ago).
# The simplest way to handle it is to pretend we have already
# "read" lines containing the empty string, up to the point
# where we would normally dump the buffer. That way there is
# no special case.

if ($match) {
push @buff, ("") x (1+$before+$after+$match-$.);
&dump;
}

# Dump the contents of the circular buffer to screen.

sub dump {
print
$skimmed ? "" : $prelude,
@buff[0..$#buff-$before-1],
$postlude;
@buff = ($before ? @buff[-$before, -1] : ());
$match = $skimmed = 0;
}

__END__

Sample runs:

$ ./patba.pl -C 0 '^Aa|^Zu|perl' /usr/dict/words
>>>
Aarhus
Aaron
<<<
>>>
improperly
<<<
>>>
properly
<<<
>>>
superlative
superlatively
superlatives
<<<
>>>
Zulu
Zulus
Zurich
<<<

$ cat /usr/dict/words | ./patba.pl -C 2 '^Aa|^Zu|perl'
>>>
Aarhus
Aaron
Ababa
aback
<<<
>>>
impromptu
improper
improperly
impropriety
improve
<<<
>>>
propensity
proper
properly
properness
propertied
<<<
>>>
superiority
superiors
superlative
superlatively
superlatives
supermarket
supermarkets
<<<
>>>
Zoroaster
Zoroastrian
Zulu
Zulus
Zurich
<<<

--
Jamie McCarthy

Ken Pizzini

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

On 16 Sep 1999 08:52:46 -0700, Tom Christiansen <tch...@mox.perl.com> wrote:
>So here's the challenge: devise idiomatic solutions to the
>following problems.

Can we abbreviate the solutions a bit --- omit the command-line
parsing as being boring? While essential to an actual utility
the command-line parsing tends to clutter newsgroup postings
and obscure the stark beauty of any truly elegant solution.
(I am not making the claim that my code below qualifies as
"truly elegant", though I think my solution to (1) comes close.)

> 1) Write a "patfore" program that prints out up to N lines
> before the match as well as the match itself. Here's
> the usage message:
>
> patfore [-B N] pattern [files ...]

push @lines, $_;
print splice(@lines, 0, $N) if /pattern/;
shift @lines if @lines >= $N;

> 2) Write a "pataft" program that prints out up to N lines
> after the match as well as the match itself. Here's
> the usage message:
>
> pataft [-A N] pattern [files ...]

/pattern/ and $end = $. + $N;
print if $. <= $end;

> 3) Write a "patba" program that prints out up to
> X lines before the match and Y lines after the match.
>
> patba [-A X] [-B T] pattern [files ...]
>

> or both N before and after:
>

> patba [-C N] pattern [files ...]

Assume that the command-line parsing of -C N sets $X=$T=$N .

push @lines, $_;
if (($matched = /pattern/) || $. <= $end) {
print @lines;
@lines = ();
$end = $matched && $. + $X;
} else {
shift @lines if !$end && @lines >= $T;
}

>For Extra Credit:
> Provide alternate solutions that also coalesce with overlapping
> ranges. For example, if you ask for 2 lines before and after, and
> lines 2, 4, and 5 all contain matches, the output should comprise
> lines [1-7] rather three separate output blocks showing lines.
> [1-4], [2-6], and [3-7].

The above solutions all coalesce; they become more cumbersome if
they are altered to avoid coalescing.

--Ken Pizzini

Kevin Reid

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

Tom Christiansen <tch...@mox.perl.com> wrote:

> We're lately been stricken with questions about what comes down to
> variations on the same essential theme: how to search a stream for matches
> and then produce as output some number of lines before or after the match
> as well as the matched line. This has to work efficiently for an input
> stream of unlimited length.

<snip>

> So here's the challenge: devise idiomatic solutions to the
> following problems.
>

> 1) Write a "patfore" program that prints out up to N lines
> before the match as well as the match itself.

> 2) Write a "pataft" program that prints out up to N lines
> after the match as well as the match itself.
>

> 3) Write a "patba" program that prints out up to
> X lines before the match and Y lines after the match.

It seems to me that 3) does everything 1) and 2) do.

Therefore, here is my single solution:

#!perl -w

use Getopt::Std;
getopts('mB:A:') and $pat = shift and @ARGV or usage();

foreach ($opt_A, $opt_B, $opt_m) {$_ ||= 0}
$pat =~ s!^/(.*)/$!$1!;

foreach $IN (@ARGV) {
open IN or die "Can't open $IN: $!";

LINE: while (push(@lindex, tell IN), defined($_ = <IN>)) {
shift @lindex if $#lindex > $opt_B;
if (/$pat/o) {
print "---\n" if $noncontiguous;

seek(IN, $lindex[0], 0);
for (my $i = 0; $i < $opt_A + @lindex; $i++) {
my $y = <IN>; last if !defined $y;
print(($opt_m ? ($i == $#lindex ? '+ ' : ' ') : '') . $y);
}
@lindex = ();
$noncontiguous = 0;
} else {$noncontiguous = 1}
}
}

sub usage {print <<"END" and exit}
$0 [-m] [-B lines] [-A lines] /pattern/ [files...]

-m Print indication of matched line
-B n Print n lines before the match
-A n Print n lines after the match

Slashes on the pattern are optional.

END

__END__

> Solutions will be judged upon these overall criteria:

<snip>

> Subjective:
> * conciseness: keep it short. no frills. just a few lines.

Nope - plenty of frills here.

--
Kevin Reid: | Macintosh:
"I'm me." | Think different.

Tom Christiansen

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

[courtesy cc of this posting mailed to cited author]

In comp.lang.perl.misc,
snow...@long-lake.nihongo.org (Benjamin Franz) writes:
:if (0 == @file_list) {
: push (@file_list,'-')
:}
:$last_n++;
:while (my $file_name = shift @file_list) {
: my $open_name = $file_name;
: if ($file_name eq '-') {
: $open_name = "<&STDIN";
: $file_name = '';

Is there some reason why normal <ARGV> isn't what you want?
You seem to be emulating what it already does.

unshift(@ARGV, '-') unless @ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV) || do {
warn "cannot open $ARGV: $!\n";
next;
};
while (<ARGV>) {
... # code for each line
}
}

--tom
--
If you want capitalism, go to Russia!

bl...@my-deja.com

unread,

Sep 16, 1999, 3:00:00 AM9/16/99

to

In article <37e1...@cs.colorado.edu>,
Tom Christiansen <tch...@mox.perl.com> wrote:

> So here's the challenge: devise idiomatic solutions to the
> following problems.
>
> 1) Write a "patfore" program that prints out up to N lines

> before the match as well as the match itself. Here's
> the usage message:

system("/gnu/egrep -B linesbefore whatever");

> 2) Write a "pataft" program that prints out up to N lines

> after the match as well as the match itself. Here's
> the usage message:

system("/gnu/egrep -A linesafter whatever");

> 3) Write a "patba" program that prints out up to
> X lines before the match and Y lines after the match.

system("/gnu/egrep -B linesbefore -A linesafter whatever");

> Provide alternate solutions that also coalesce with overlapping
> ranges. For example, if you ask for 2 lines before and after, and
> lines 2, 4, and 5 all contain matches, the output should comprise
> lines [1-7] rather three separate output blocks showing lines.
> [1-4], [2-6], and [3-7].

Not quite sure what GNU egrep does there. OK this was a silly post
but semi seriously if you don't mind the toolbox approach...

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.

Kragen Sitaker

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

In article <7rrrc9$d...@cocoa.brown.edu>, Don Blaheta <d...@cs.brown.edu> wrote:
> # pataft
> while (<>) {
> $aftleft = $aft + 1 if /$pat/o;
> print and $aftleft-- if $aftleft;
> }

Very nice! Exactly the same method I used.

> #patfore
> while (<>) {
> shift @forelines if @forelines > $fore;
> push @forelines, $_;
> print @forelines and @forelines = () if /$pat/o;
> }

Also very nice; slightly different approach than my code, but very
similar.

Both of these programs avoid printing the same line more than once.

> #patba
> while (<>) {
> shift @forelines if @forelines > $fore;
> push @forelines, $_;
> print and $aftleft-- and @forelines = () if $aftleft;
> print @forelines and $aftleft = $aft if /$pat/o;
> }

This is the one I had the most trouble with, mostly avoiding the
overlap.

What happens here if $aft is 0 and we get two sequential input lines
with the pattern?

> use Getopt::Std;
>
> getopt('ABC', \%opt);
> $fore = $opt{B} || $opt{C} || 3;
> $aft = $opt{A} || $opt{C} || 3;
> $pat = shift;

Damn. I need to learn more about the standard modules!

I notice you're not using strict.

I also notice you picked the same default number of lines as I did!

>> * time efficiency: is your solution fast?
>

>Hard to say until I have something to test it against. I think I at
>least avoided anything excruciatingly _slow_.

I'll test it against mine when I have the time :)

Benjamin Franz

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

[reposted because my ISP doesn't have clp.moderated marked
as a moderated group]

In article <37e1...@cs.colorado.edu>,
Tom Christiansen <tch...@mox.perl.com> wrote:
>
>So here's the challenge: devise idiomatic solutions to the
>following problems.
>
> 1) Write a "patfore" program that prints out up to N lines
> before the match as well as the match itself. Here's
> the usage message:
>

> patfore [-B N] pattern [files ...]

Only have the free time for one of them.
Not as elegant as some of the others, but it works,
is fast, and is conservative with resources.

#!/usr/bin/perl -w

use strict;
use Getopt::Long;

my $opts = {};
GetOptions($opts,'B=i');
my $last_n = defined ($opts->{'B'}) ? $opts->{'B'} : 0;

my $pattern = shift;
if (not defined $pattern) {
die <<"USAGE";
patfor [-B N] pattern [files ...]
USAGE
}

my @file_list = @ARGV;

if (0 == @file_list) {
push (@file_list,'-')
}
$last_n++;
while (my $file_name = shift @file_list) {
my $open_name = $file_name;
if ($file_name eq '-') {
$open_name = "<&STDIN";
$file_name = '';

} else {
$file_name .= ':';
}
if (not open (THE_FILE,$open_name)) {
warn("WARNING: Unable to open '$open_name' ($!) - skipping\n");
next;
}
my @circular_buffer = ();
my $buffer_pointer = -1;
my $buffer_n = 0;
my $last_line = 0;
while (<THE_FILE>) {
chomp;
$buffer_pointer++;
$buffer_pointer %= $last_n;
$circular_buffer[$buffer_pointer] = "$file_name\[$.] $_\n";
$buffer_n++ if ($buffer_n != $last_n);
if (m/$pattern/o) {
my $buffer_offset = 1 - $buffer_n;
if (($. + $buffer_offset) <= $last_line) {
$buffer_offset = $last_line - $. + 1;
}
my $start_at = $buffer_pointer + $buffer_offset;
print join('',@circular_buffer[$start_at..$buffer_pointer]);
$last_line = $.;
}
}
close(THE_FILE) || die ("Failed to close handle for file '$open_name': $!\n");
}

------------------------------------------------------------

The following is not runnable code - it just looks like it.
It is actually the test set data. The words 'TEST' and 'test'
are the targets. Their distribution allows the testing
of first occurance before buffer fill, adjecent lines,
non-adjecent but overlapping ranges, first occurance
with buffer filled, second occurance with buffer fill,
second occurance before buffer full, etc. Running
./patfore -B test N < testfile and varying N from 0
to 8 is _very_ effective in finding broken edge cases.

#!/usr/bin/perl -w
use strict;
use Image::Size;

my $test_file = 'hibunny.jpg';
open (TESTFILE,$test_file) or die ("Unable to open '$test_file': $!\n");
$/ = undef;
binmode TESTFILE;
my $data = <TESTFILE>;
close (TESTFILE);

print _identify_data($data,$test_file),"\n";

Benjamin Franz

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

In article <37e1...@cs.colorado.edu>,
Tom Christiansen <tch...@mox.perl.com> wrote:
>So here's the challenge: devise idiomatic solutions to the
>following problems.
>
> 1) Write a "patfore" program that prints out up to N lines
> before the match as well as the match itself. Here's
> the usage message:
>
> patfore [-B N] pattern [files ...]

Sorry, only have time to solve one of them. Not as elegant
as some of the others, but it works and is conservative
with resources. It also collates ranges and processes
STDIN as an option.

#!/usr/bin/perl -w

my @file_list = @ARGV;

--
Benjamin Franz

Don Blaheta

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Quoth Kragen Sitaker:

> In article <7rrrc9$d...@cocoa.brown.edu>, Don Blaheta <d...@cs.brown.edu> wrote:
> > #patba
> > while (<>) {
> > shift @forelines if @forelines > $fore;
> > push @forelines, $_;
> > print and $aftleft-- and @forelines = () if $aftleft;
> > print @forelines and $aftleft = $aft if /$pat/o;
> > }
>
> This is the one I had the most trouble with, mostly avoiding the
> overlap.
>
> What happens here if $aft is 0 and we get two sequential input lines
> with the pattern?

Ah, hadn't tried an aft of 0. I'll check it out next time I'm at my
regular node.

> I notice you're not using strict.

I did in my test program, and it worked except that the variables needed
to be declared, e.g. with
use vars qw(@forelines $fore $aftleft $aft $pat %opt)

I left it out of the submitted program to avoid visual clutter.

> >> * time efficiency: is your solution fast?
> >
> >Hard to say until I have something to test it against. I think I at
> >least avoided anything excruciatingly _slow_.
>
> I'll test it against mine when I have the time :)

<grin>

--
-=-Don Blaheta-=-=-d...@cs.brown.edu-=-=-<http://www.cs.brown.edu/~dpb/>-=-

QUE SERA SERF
Life is feudal

Uri Guttman

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

>>>>> "DB" == Don Blaheta <d...@cs.brown.edu> writes:

DB> Here are the three while loops comprising the brunt of my programs:

DB> while (<>) {
DB> shift @forelines if @forelines > $fore;
DB> push @forelines, $_;
DB> print @forelines and @forelines = () if /$pat/o;

DB> while (<>) {
DB> shift @forelines if @forelines > $fore;
DB> push @forelines, $_;
DB> print and $aftleft-- and @forelines = () if $aftleft;
DB> print @forelines and $aftleft = $aft if /$pat/o;

well, what happens when you have multiple files and a match is at the
end of the previous file? i think you will print the next N lines from
the next file since you have a single global <> loop. and if that is all
your loop will be, you could out the @ARGV stuff in a BEGIN block and
wrap the code in a -n loop.

i am working on version which will do all 3, handle the
EOF problem as well as have a -O option to do overlaps, all in a single
program. wish me luck and tuits.

uri

--
Uri Guttman ----------------- SYStems ARCHitecture and Software Engineering
u...@sysarch.com --------------------------- Perl, Internet, UNIX Consulting
Have Perl, Will Travel ----------------------------- http://www.sysarch.com
The Best Search Engine on the Net ------------- http://www.northernlight.com
"F**king Windows 98", said the general in South Park before shooting Bill.

Rick Delaney

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Tom Christiansen wrote:
>
> 2) Write a "pataft" program that prints out up to N lines
> after the match as well as the match itself. Here's
> the usage message:
>

> pataft [-A N] pattern [files ...]

Okay, I'll take the easy one.

perl -ne 'print if ($c=0, /pat/) .. $c++ == 3' files

If one really wanted to save this then something like this would do.

#!/usr/local/bin/perl -s

$A &&= shift;
$pat = shift;
die <<USAGE unless @ARGV;
pataft [-A N] pattern [files ...]
USAGE

while(<>){
print if ($c = 0, /$pat/o) .. $c++ == $A;
}
__END__

Of course we probably shouldn't encourage skipping -w (though I see no
need for 'use strict' for something so short).

Adding -w means changing

$A &&= shift;
to
$A = $A ? shift : 0;

> For Extra Credit:

> Provide alternate solutions that also coalesce with overlapping
> ranges. For example, if you ask for 2 lines before and after, and
> lines 2, 4, and 5 all contain matches, the output should comprise
> lines [1-7] rather three separate output blocks showing lines.
> [1-4], [2-6], and [3-7].

A slight rearrangement after USAGE give this alternate solution:

$c = $A + 1;
while(<>){
$c = 0 if /$pat/o;
print if $c++ <= $A;
}

, which is pretty much like the other posters' solutions.

But I don't like it because it doesn't make a nice one-liner as above.
I guess I still have grep.

--
Rick Delaney
rick.d...@home.com

Philip 'Yes, that's my address' Newton

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Kragen Sitaker wrote:
>
> In article <7rrrc9$d...@cocoa.brown.edu>, Don Blaheta <d...@cs.brown.edu> wrote:
> >

> > $fore = $opt{B} || $opt{C} || 3;
> > $aft = $opt{A} || $opt{C} || 3;
>

> I also notice you picked the same default number of lines as I did!

Perhaps comes from diff(1), which prints three lines of context with the
-c flag, and has the -C n flag for 'n' lines of context if you don't
want three.

Cheers,
Philip

Neko

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

On 16 Sep 1999 08:52:46 -0700, Tom Christiansen <tch...@mox.perl.com> wrote:

> 1) Write a "patfore" program that prints out up to N lines
> before the match as well as the match itself. Here's
> the usage message:
>

> patfore [-B N] pattern [files ...]

Two versions. The first will output each match. The second will coalesce
overlaps. Both actually store more (maybe 100) lines than necessary, trading
a bit of space for a bit of time (less shifting of arrays). They are still
space efficient though in that they won't slurp whole files.

#!/usr/bin/perl
# patfore [-B N] pattern [files ...]
# Each match by itself.

$b = @ARGV && $ARGV[0] eq '-B'
? (shift, shift)[1]
: 3;
$max = $b > 100 ? $b : 100;

$re = shift;
defined $re or die "usage: patfore [-B N] pattern [files ...]\n";

while (<>) {
print "Match #", ++$i, " [\n", @lines[-$b..-1], "$_]\n\n" if /$re/o;
push @lines, $_;
@lines = @lines[-$b..-1] if @lines > $max;
}

__END__

#!/usr/bin/perl
# patfore [-B N] pattern [files ...]
# Will coalesce overlaps.

$b = @ARGV && $ARGV[0] eq '-B'
? (shift, shift)[1]
: 3;
$max = $b > 100 ? $b : 100;

$re = shift;
defined $re or die "usage: patfore [-B N] pattern [files ...]\n";

while (<>) {
@lines = @lines[-$b..-1] if @lines > $max;
unless (/$re/o) {
push @lines, $_;
next;
}
print "Match #", ++$i, " [\n", @lines[-$b..-1], $_;
COAL: for (1..$b) {
last if eof;
push @lines, scalar <>;
if ($lines[-1] =~ /$re/o) {
print @lines[-$_..-1];
goto COAL;
}
}
print "]\n\n";
}

__END__

> 2) Write a "pataft" program that prints out up to N lines
> after the match as well as the match itself. Here's
> the usage message:
>
> pataft [-A N] pattern [files ...]

Again two versions. The first uses a seek() without checking for errors, so
it's not so robust. Also, if there's any overlap, it ends up reading the
same lines twice. It would have been better to keep an array of lines
around, but I'm sufficiently lazy right now not to care.

#!/usr/bin/perl
# pataft [-A N] pattern [files ...]
# Each match by itself.

$a = @ARGV && $ARGV[0] eq '-A'
? (shift, shift)[1]
: 3;

$re = shift;
defined $re or die "usage: pataft [-A N] pattern [files ...]\n";

while (<>) {
/$re/o or next;
$tell = tell ARGV;
print "Match #", ++$i, " [\n$_";
for (1..$a) {
last if eof;
print scalar <>;
}
print "]\n\n";
seek ARGV, $tell, 0;
}

__END__

#!/usr/bin/perl
# pataft [-A N] pattern [files ...]
# Will coalesce overlaps.

$a = @ARGV && $ARGV[0] eq '-A'
? (shift, shift)[1]
: 3;

$re = shift;
defined $re or die "usage: pataft [-A N] pattern [files ...]\n";

while (<>) {
/$re/o or next;
print "Group #", ++$i, " [\n$_";
COAL: for (1..$a) {
last if eof;
$line = <>;
print $line;
goto COAL if $line =~ /$re/o;
}
print "]\n\n";
}

__END__

> 3) Write a "patba" program that prints out up to
> X lines before the match and Y lines after the match.
>

> patba [-A X] [-B T] pattern [files ...]
>
> or both N before and after:
>
> patba [-C N] pattern [files ...]

#!/usr/bin/perl
# patba [ [ [-A X] [-B Y] ] | [-C N] ] pattern [files ...]
# Each match by itself.

while (@ARGV && $ARGV[0] =~ /-[ABC]/) {
/A/ ? ($a) :
/B/ ? ($b) :
($a, $b) = (shift) x 2 for shift;
}
defined or $_ = 3 for $a, $b;
$b++;
$max = $b > 100 ? $b : 100;

$re = shift;
die "usage: patba [ [ [-A X] [-B Y] ] | [-C N] ] pattern [files ...]\n"
unless defined $re;

while (<>) {
push @lines, $_;

@lines = @lines[-$b..-1] if @lines > $max;
/$re/o or next;
$tell = tell ARGV;
print "Match #", ++$i, " [\n", @lines[-$b..-1];
for (1..$a) {
last if eof;
print scalar <>;
}
print "]\n\n";
seek ARGV, $tell, 0;
}

__END__

#!/usr/bin/perl
# patba [ [ [-A X] [-B Y] ] | [-C N] ] pattern [files ...]
# Will coalesce overlaps.

while (@ARGV && $ARGV[0] =~ /-[ABC]/) {
/A/ ? ($a) :
/B/ ? ($b) :
($a, $b) = (shift) x 2 for shift;
}
defined or $_ = 3 for $a, $b;
$max = $b > 100 ? $b : 100;

$re = shift;
die "usage: patba [ [ [-A X] [-B Y] ] | [-C N] ] pattern [files ...]\n"
unless defined $re;

while (<>) {
@lines = @lines[-$b..-1] if @lines > $max;
unless (/$re/o) {
push @lines, $_;
next;
}
print "Group #", ++$i, " [\n", @lines[-$b..-1], $_;
COAL: for (1..$a) {
last if eof;
$line = <>;
print $line;
goto COAL if $line =~ /$re/o;
}
for (1..$b) {
last if eof;
push @lines, scalar <>;
if ($lines[-1] =~ /$re/o) {
print @lines[-$_..-1];
goto COAL;
}
}
print "]\n\n";
}

__END__

>Solutions will be judged upon these overall criteria:
>

> Objective:
>
> * correctness: does it actually do the right thing? if not,
> nothing else matters. :-)
> * space efficiency: don't use more space than minimally needed

> * time efficiency: is your solution fast?

> * test coverage: do you include test data to check all border cases?

They are correct unless they aren't. When printing the previous N lines,
more space is used than absolutely necessary, but it means I don't have to
shift a perfect array for every line. I used a modified version of Don
Blaheta's test data (it adds a 'foo' at the very start and very end):

foo

a
b
foo
foo
c
foo
d
e
f
g
foo
h
i
j
k
l
m
n
foo
o

p
q
r
s
t
u
v
w
x
y
z
foo

> Subjective:
>
> * conciseness: keep it short. no frills. just a few lines.

> * clarity: is this understandable to a native perl speaker?
> * idiom: does this look like natural perl? does it use cool
> perl features absent in other languages?
> * creativity: is your solution cleverly different from those
> of others?

Some of it is more concise than others, but none measured to just a few
lines. Those who've walked the desert should have no trouble understanding
the code, though a few may spit on me for using goto(). It's Perl written in
Perl so perhaps it's idiomatic. I doubt it's very creative or clever though.

--
Neko | t...@chocobo.org | Will hack Perl for a moogle stuffy! =^.^=

Rick Delaney

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Tom Christiansen wrote:
>
> pataft [-A N] pattern [files ...]

To which I said,

perl -ne 'print if ($c=0, /pat/) .. $c++ == 3' files

and then I complained that I didn't have a nice extra credit one-liner.
(I'm not following-up to that post since I don't see it).

I should point out that there are still one-liners for this, even if not
as nice.

perl -ne 'print if /pat/ .. (/pat/ ? $c = 0 : ++$c) == 3' files

I didn't really like this since /pat/ appears in the code twice and is
actually tested twice for some lines.

perl -ne 'print if ($c=0, /pat/) ... (/pat/ ? $c = 0 : ++$c) == 3' \
files

/pat/ is only tested once for each line here but now it's just getting
ugly.

--
Rick Delaney
rick.d...@home.com

Benjamin Franz

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

In article <37e1...@cs.colorado.edu>,
Tom Christiansen <tch...@mox.perl.com> wrote:
>

>In comp.lang.perl.misc, snow...@long-lake.nihongo.org (Benjamin Franz) writes:

>:if (0 == @file_list) {

>: push (@file_list,'-')
>:}
>:$last_n++;
>:while (my $file_name = shift @file_list) {
>: my $open_name = $file_name;
>: if ($file_name eq '-') {
>: $open_name = "<&STDIN";
>: $file_name = '';
>

>Is there some reason why normal <ARGV> isn't what you want?
>You seem to be emulating what it already does.

No particular reason. Old habits from the fact that I don't
write a lot of command line option programs currently.

--
Benjamin Franz

lt lindley

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

I thought I had submitted this last night. But I was falling asleep
and so it may have been a dream. Or maybe I tried to post it to
moderated without looking. Followups set to clpm only.

#!/usr/lib/lprgs/perl -w
use strict;

# Print a pattern and up to N lines of text before the pattern.

# Assumptions:
# 1) the pattern can match across a line boundary!
# 2) Memory is cheap. This is the barbarian "all memory should be free!"
# entry. Not some namby pamby space conservation algorithm.
# Seriously, memory is way too cheap to worry about like we used to do.
# 3) No reasonable way that I can see to get the bonus with the
# regexp approach. I think it is possible with a lookahead assertion
# and some code in the regexp that sets pos(), but it is too horrid
# to contemplate for long.

# Interesting code moved to the top. Not normally how I would have
# layed this out.

my (%opts, $USAGE);
my $pattern = parse_args();

# Should I assume that when the user gives a space char that they
# really mean \s and that it includes newline? Assume so unless
# they are sophisticated enough to tell me to leave their sacred pattern
# alone. Otherwise, I'm going to expect unsophisticated users who
# provide the string "black ice" and expect it to match when one line
# ends with "black" and the next line starts with "ice". Therefore,
# I mangle the user supplied pattern by default.
$pattern =~ s/ +/\\s/g unless exists $opts{S};

$pattern = qr{
(
(?:.*\n){0,$opts{B}} # Match up to -B (default 4) arbitrary lines
(?m: # ^ and $ match on newline for this part
^.* # 0 or more chars begining of this line
(?s-x:$pattern) # Whatever they want to match
# where . and \s match newline too
# but space has meaning within $pattern
#HEY! Fix perlre to say that \s matches
#newline just like . does under /s!!!!
#That is expected, but not obvious!!!!
.*$ # Match to end of line containing $pattern
)
\n # Throw in final newline of line containing
# pattern -- /m modifier dropped it.
)}xo;

sub find_pattern_in_file {
my ($fh, $file_name) = @_;
print "\n\n$file_name --\n" if (defined $file_name);
# Screw memory requirements. Memory is cheap and modern books aren't
# that long.
local $/;
my $string = <$fh>;
print $1, "\n--\n" while ($string =~ /$pattern/g);
}

# The rest of this just handles the argument processing and
# file manipulation.
if (@ARGV == 0) {
if (exists $opts{D}) {
# Specail test case
find_pattern_in_file(\*DATA);
} else {
# No file name provided as an argument
find_pattern_in_file(\*STDIN);
}
} else {
foreach my $file_name (@ARGV) {
# Rather than simply using magic <ARGV> processing,
# I want to print the file name.
local *FH;
open(FH, $file_name)
or die "Failed to open $file_name: $!\n\t$USAGE";
find_pattern_in_file(\*FH, $file_name);
}
}
exit 0;

sub parse_args {
$USAGE = "patfore [-B N] [-S] pattern [files ...| -D]
\twhere N is number of lines before pattern to display and
\t-S indicates a strict pattern that should not be white space pampered
\tand -D says use the testing data at the bottom of the source";

use Getopt::Std;

die $USAGE unless getopts('DSB:', \%opts);
$opts{B} = 4 unless exists $opts{B}; # Number of lines before pat to print
die $USAGE unless @ARGV >= 1; # There must be at leat one pattern arg
shift @ARGV; # Leftover after args processing
}
__END__
This is line 1
and line 2
and line 3
and yet a line 4
and a line 5
and 6 don't you know
and a line 7
and 8 is the last.
--
// Lee.Lindley /// Programmer shortage? What programmer shortage?
// @bigfoot.com /// Only *cheap* programmers are in short supply.
//////////////////// 50 cent beers are in short supply too.

Don Blaheta

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Quoth Kragen Sitaker:

> In article <7rrrc9$d...@cocoa.brown.edu>, Don Blaheta <d...@cs.brown.edu> wrote:

> > #patba
> > while (<>) {

> > shift @forelines if @forelines > $fore;

> > push @forelines, $_;

> > print and $aftleft-- and @forelines = () if $aftleft;

> > print @forelines and $aftleft = $aft if /$pat/o;
> > }
>

> This is the one I had the most trouble with, mostly avoiding the
> overlap.
>
> What happens here if $aft is 0 and we get two sequential input lines
> with the pattern?

Okay, I finally had time to check this out. You're right that an $aft
of zero breaks it; but note that A) the way I got my CL options, $aft
_can't_ be zero, but this is okay because B) if you want an $aft of
zero, why not just use patfore?

However, even I'm not convinced by my own reasoning here, so here's a
new version of patba that should work with an $aft of zero. Credit to
whoever pointed out C<splice>; it solved an ugliness problem that had
been bugging me. :)

while (<>) {

shift @forelines if @forelines > $fore;

push @forelines, $_ unless $aftleft;

print and $aftleft-- if $aftleft;

print splice @forelines, 0 and $aftleft = $aft if /$pat/o;
}

Of course, you'd need to fix the CL options to accept 0 as valid, too:

getopt('ABC', \%opt);
$C = exists($opt{C})? $opt{C} : 3;
$fore = exists($opt{B})? $opt{B} : $C;
$aft = exists($opt{A})? $opt{A} : $C;
$pat = shift;

> Damn. I need to learn more about the standard modules!

I so love the standard modules. :) Talk about never (read: seldom ;)
having to reinvent the wheel!

--
-=-Don Blaheta-=-=-d...@cs.brown.edu-=-=-<http://www.cs.brown.edu/~dpb/>-=-

"I myself have never been able to find out precisely what feminism is: I
only know that people call me a feminist whenever I express sentiments
that differentiate me from a doormat..." --Rebecca West

Don Blaheta

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Quoth Uri Guttman:

> well, what happens when you have multiple files and a match is at the
> end of the previous file? i think you will print the next N lines from
> the next file since you have a single global <> loop. and if that is all
> your loop will be, you could out the @ARGV stuff in a BEGIN block and
> wrap the code in a -n loop.

True enough, but easy to fix; after the while loop put a continue block
that resets the context counters. For those keeping track, that makes
my patba code now

while (<>) {
shift @forelines if @forelines > $fore;
push @forelines, $_ unless $aftleft;
print and $aftleft-- if $aftleft;
print splice @forelines, 0 and $aftleft = $aft if /$pat/o;

} continue {
@forelines = (), $aftleft = 0 if eof;
}

with credit to Ken Pizzini for thinking of C<splice> and Uri for
reminding me about the eof problem.

> i am working on version which will do all 3, handle the
> EOF problem as well as have a -O option to do overlaps, all in a single
> program. wish me luck and tuits.

I decided that *not* coalescing overlaps would actually be quite a bit
harder and not as useful, so I just skipped it. But may you have much
luck and many tuits. Big round ones, with maybe a few squarish ones
thrown in for flavour. :)

--
-=-Don Blaheta-=-=-d...@cs.brown.edu-=-=-<http://www.cs.brown.edu/~dpb/>-=-

"Piece of cake, of the have-it-and-eat-it-too variety."
--_Programming Perl_

lt lindley

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Sorry to follow up on myself, but it was only a few small changes
to make this into the patba program instead of just patfore.

#!/usr/lib/lprgs/perl -w
use strict;

# Program: patba

# Print a pattern and up to N lines of text before the pattern.

# and up to N lines of text after the pattern.

# Assumptions:
# 1) the pattern can match across a line boundary!
# 2) Memory is cheap. This is the barbarian "all memory should be free!"
# entry. Not some namby pamby space conservation algorithm.
# Seriously, memory is way too cheap to worry about like we used to do.
# 3) No reasonable way that I can see to get the bonus with the
# regexp approach.

# Interesting code moved to the top. Not normally how I would have
# layed this out.

my (%opts, $USAGE);
my $pattern = parse_args();

# Should I assume that when the user gives a space char that they
# really mean \s and that it includes newline? Assume so unless
# they are sophisticated enough to tell me to leave their sacred pattern
# alone. Otherwise, I'm going to expect unsophisticated users who
# provide the string "black ice" and expect it to match when one line
# ends with "black" and the next line starts with "ice". Therefore,
# I mangle the user supplied pattern by default.
$pattern =~ s/ +/\\s/g unless exists $opts{S};

$pattern = qr{
(
(?:.*\n){0,$opts{B}} # Match up to -B arbitrary lines
# Interesting that this does nothing if
# $opts{B} is 0!

(?m: # ^ and $ match on newline for this part
^.* # 0 or more chars begining of this line
(?s-x:$pattern) # Whatever they want to match
# where . and \s match newline too
# but space has meaning within $pattern
#HEY! Fix perlre to say that \s matches
#newline just like . does under /s!!!!
#That is expected, but not obvious!!!!
.*$ # Match to end of line containing $pattern
)
\n # Throw in final newline of line containing

# pattern -- m modifier dropped it.
(?:.*\n){0,$opts{A}} # Match up to -A arbitrary lines
)}xo;

sub parse_args {
$USAGE = "patfore [-A X] [-B T] [-C N] [-S] pattern [files ...| -D]
\twhere T is number of lines before pattern to display and
\twhere X is number of lines after pattern to display *OR*
\twhere N is number of lines both before and after pattern to display

\t-S indicates a strict pattern that should not be white space pampered
\tand -D says use the testing data at the bottom of the source";

use Getopt::Std;

die $USAGE unless getopts('DSA:B:C:', \%opts);
$opts{B} = 0 unless exists $opts{B}; # Number of lines before pat to print
$opts{A} = 0 unless exists $opts{A}; # Number of lines after pat to print
$opts{B} = $opts{A} = $opts{C} if (exists $opts{C}); # Overrides both A and B
# There must be at leat one pattern arg and at least A or B must be set
# I should probably check for reasonable values of A and B here, but if
# somebody wants to try "-2" lines before the pattern, that's there problem.
die $USAGE unless (@ARGV >= 1 && ($opts{A} > 0 || $opts{B} > 0));
shift @ARGV; # return pattern

Jamie McCarthy

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Since everyone else seems to be taking the "no frills" part of
the contest seriously, I might as well too. Here's my little
program minus its spiffy features, comments, and initialization.
This version doesn't have the "prelude" and "postlude" options,
nor does it truly run in minimal space.

while (defined($line = <>)) {
push @buff, $line;
$match = $. if $line =~ /$regex/o;

&dump if $match and $. - $match > $before+$after;

shift @buff if !$match and $#buff >= $before;
}

if ($match) {
push @buff, ("") x (1+$before+$after+$match-$.);
&dump;
}

sub dump {
print @buff[0..$#buff-$before-1];

@buff = ($before ? @buff[-$before, -1] : ());
$match = $skimmed = 0;
}

The "skimming" of the first version has been taken out to make the
code smaller. This makes only a minor difference in normal
operation but in pathological cases it can add up. I tried this
test case with both versions:

$ ./patba.pl -A 99999 '^Aa|^Zu|perl' /usr/dict/words

They both outputted the same 400K results file (the entire contents
of /usr/dict/words), but Linux reports a run size difference of
over 3 MB (9200K VmSize for the first version and 12940K for the
no-frills).

--
Jamie McCarthy

Abigail

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

Tom Christiansen (tch...@mox.perl.com) wrote on MMCCVII September
MCMXCIII in <URL:news:37e1...@cs.colorado.edu>:
@@
@@ So here's the challenge: devise idiomatic solutions to the
@@ following problems.

For exercises 1) through 3), it is assumed we are only interested
in the first match; the specification isn't clear what to do when
multiple lines match. Furthermore, when given multiple files, it
is assumed you want them to look like one big file, so, for instance
for peraft, if there's a match on the last line of the first file,
lines of the second file will be printed. This is done in the spirit
of KISS, if you don't want this behaviour, just call the programs
multiple times.

@@ 1) Write a "patfore" program that prints out up to N lines
@@ before the match as well as the match itself. Here's
@@ the usage message:
@@
@@ patfore [-B N] pattern [files ...]

#!/opt/perl/bin/perl -w

use strict;

sub usage () {
die "$0 [-B N] pattern [files .... ]\n";
exit 1;
}

# Parse arguments.

my $lines = 0;
if (@ARGV && $ARGV [0] eq '-B') {
usage if @ARGV < 3;
$lines = $ARGV [1];
usage if !length $lines or $lines =~ /\D/;
splice @ARGV, 0, 2;
}

usage unless @ARGV; # Need a pattern.

my $pattern = shift;
my @buffer; # Circular buffer. push/shift would be inefficient.
$buffer [$lines - 1] = undef if $lines; # Preallocate.

while (<>) { # Should read from files if @ARGV, from stdin otherwise.
if (/$pattern/o) {
print grep {defined} @buffer [$. % $lines .. $lines - 1,
0 .. $. % $lines - 1] if $lines;
print;
last;
}
else {
$lines and $buffer [$. % $lines] = $_;
}
}

__END__

Once we've done one program, all others look remarkably the same.
No surprise that all others were made by copying and then changing
a few lines.

@@ 2) Write a "pataft" program that prints out up to N lines
@@ after the match as well as the match itself. Here's
@@ the usage message:
@@
@@ pataft [-A N] pattern [files ...]

#!/opt/perl/bin/perl -w

use strict;

sub usage () {
die "$0 [-A N] pattern [files .... ]\n";
exit 1;
}

# Parse arguments.

my $lines = 0;
if (@ARGV && $ARGV [0] eq '-A') {
usage if @ARGV < 3;
$lines = $ARGV [1];
usage if !length $lines or $lines =~ /\D/;
splice @ARGV, 0, 2;
}

usage unless @ARGV; # Need a pattern.

my $pattern = shift;

while (<>) { # Should read from files if @ARGV, from stdin otherwise.
if (/$pattern/o) {
print;
print while ($lines -- && defined ($_ = <>));
last;
}
}

__END__

@@ 3) Write a "patba" program that prints out up to
@@ X lines before the match and Y lines after the match.
@@
@@ patba [-A X] [-B T] pattern [files ...]
@@
@@ or both N before and after:
@@
@@ patba [-C N] pattern [files ...]

This of course is just combining patfore and pataft.

#!/opt/perl/bin/perl -w

use strict;

sub usage () {
die "$0 [[-A X] [-B Y] | -C Z] pattern [files .... ]\n";
exit 1;
}

# Parse arguments.

my $prelines = my $postlines = 0;
# This allows for repeted arguments, or -C mixed with -A or -B. Blah.
while (@ARGV && $ARGV [0] =~ /^-[ABC]$/) {
usage if @ARGV < 3;
if ($ARGV [1] ne '-B') {
$prelines = $ARGV [1];
usage if !length $prelines or $prelines =~ /\D/;
}
if ($ARGV [1] ne '-A') {
$postlines = $ARGV [1];
usage if !length $postlines or $postlines =~ /\D/;
}
splice @ARGV, 0, 2;
}

usage unless @ARGV; # Need a pattern.

my $pattern = shift;
my @buffer; # Circular buffer. push/shift would be inefficient.
$buffer [$prelines - 1] = undef if $prelines; # Preallocate.

while (<>) { # Should read from files if @ARGV, from stdin otherwise.
if (/$pattern/o) {
print grep {defined} @buffer [$. % $prelines .. $prelines - 1,
0 .. $. % $prelines - 1] if $prelines;
print;
print while $postlines -- && defined ($_ = <>);
last;
}
else {
$prelines and $buffer [$. % $prelines] = $_;
}
}

__END__

@@ For Extra Credit:
@@ Provide alternate solutions that also coalesce with overlapping
@@ ranges. For example, if you ask for 2 lines before and after, and
@@ lines 2, 4, and 5 all contain matches, the output should comprise
@@ lines [1-7] rather three separate output blocks showing lines.
@@ [1-4], [2-6], and [3-7].

This definition is a bit vague. In the following program, it is assumed
to be meant "print 2 lines from the first match, to 2 lines after the
last match".

#!/opt/perl/bin/perl -w

use strict;

sub usage () {
die "$0 [[-A X] [-B Y] | -C Z] pattern [files .... ]\n";
exit 1;
}

# Parse arguments.

my $prelines = my $postlines = 0;
# This allows for repeted arguments, or -C mixed with -A or -B. Blah.
while (@ARGV && $ARGV [0] =~ /^-[ABC]$/) {
usage if @ARGV < 3;
if ($ARGV [0] ne '-B') {
$prelines = $ARGV [1];
usage if !length $prelines or $prelines =~ /\D/;
}
if ($ARGV [0] ne '-A') {
$postlines = $ARGV [1];
usage if !length $postlines or $postlines =~ /\D/;
}
splice @ARGV, 0, 2;
}

usage unless @ARGV; # Need a pattern.

my $pattern = shift;
my @buffer; # Circular buffer. push/shift would be inefficient.
$buffer [$prelines - 1] = undef if $prelines; # Preallocate.

while (<>) { # Should read from files if @ARGV, from stdin otherwise.
if (/$pattern/o) {
print grep {defined} @buffer [$. % $prelines .. $prelines - 1,
0 .. $. % $prelines - 1] if $prelines;
print;
# Now we have to scan the rest of file(s) for god knows how
# many other matches. We have to remember everything, because
# there can be a match on the last line. *sigh*.
my @rest = <>;
my $i = @rest;
while (-- $i >= 0 && $rest [$i] !~ /$pattern/o) {}
# $i should now be index of last match, -1 if not present.
print grep {defined} @rest [0 .. $i + $postlines];
last;
}
else {
$prelines and $buffer [$. % $prelines] = $_;
}
}

__END__

@@ Solutions will be judged upon these overall criteria:
@@
@@ Objective:
@@
@@ * correctness: does it actually do the right thing? if not,
@@ nothing else matters. :-)

Well, I didn't write a correctness proof....

@@ * space efficiency: don't use more space than minimally needed

It uses a lot more space that it minimally needs. Any solution that reads
in a line and then prints it uses more space than it should. Using reads,
reading one character, and only keeping it while it's needed for the
pattern matching would be more efficient. Furthermore, the programs
that keep lines just in case they might be printed (patfore, patba)
don't check whether the input is seekable; if it is, remembering lines
is a waste of memory.

@@ * time efficiency: is your solution fast?

No. It's written in Perl. ;-) But with the right assumptions, it's linear
in the amount of lines.

@@ * test coverage: do you include test data to check all border cases?

None at all.

@@ Subjective:
@@
@@ * conciseness: keep it short. no frills. just a few lines.

Command line parsing takes up the few lines before getting to the meat.

@@ * clarity: is this understandable to a native perl speaker?

Well, it has comments....

@@ * idiom: does this look like natural perl? does it use cool
@@ perl features absent in other languages?

regexes, <>, grep {BLOCK}.

@@ * creativity: is your solution cleverly different from those
@@ of others?

Extremely doubtful. I cannot think of any significant other solution that
isn't inefficient or totally unclear.

Abigail
--
perl -MNet::Dict -we '(Net::Dict -> new (server => "dict.org")
-> define ("foldoc", "perl")) [0] -> print'

-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----

Ken Pizzini

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

On 16 Sep 1999 21:59:57 GMT, Ken Pizzini <k...@halcyon.com> wrote:
> push @lines, $_;
> if (($matched = /pattern/) || $. <= $end) {
> print @lines;
> @lines = ();
> $end = $matched && $. + $X;
> } else {
> shift @lines if !$end && @lines >= $T;

Oops --- last minute edit broke this script; this line should read:
shift @lines if $. > $end && @lines >= $T;

> }
--Ken Pizzini

Ilya Zakharevich

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

[A complimentary Cc of this posting was sent to Tom Christiansen
<tch...@mox.perl.com>],
who wrote in article <37e1...@cs.colorado.edu>:

> 3) Write a "patba" program that prints out up to

> X lines before the match and Y lines after the match.
>

> patba [-A X] [-B T] pattern [files ...]
>

> or both N before and after:
>

> patba [-C N] pattern [files ...]

Here is YAO. It takes one file only, and matches for /\b\w\b/ with 3
lines of context before and a fter the match.
Changing it to correspond to the above API is left as an exersize to
the reader.

Ilya

#!/usr/bin/perl -w
use strict;

package TH;

# Array: 0: FH, 1: buffer, 2: pre, 3: pre + 1 + post, 4: count, 5: closed, 6: last-output
sub TIEHANDLE {
shift;
local *T; # Requires *T below, not \*T
open T, shift or die;
my ($pre, $post, @buf) = (shift, shift);
my $me = bless [*T, \@buf, $pre, $pre + $post + 1, 0, 0, -1 ];
$me->fill for 1..$post;
$me;
}
sub fill {
my $self = shift;
return if $self->[5];
my $in = readline $self->[0];
$self->[5] = 1, (close $self->[0] or die), return unless defined $in;
$self->[1][$self->[4]++ % $self->[3]] = $in;
}
sub READLINE {
my $self = shift;
$self->fill;
return if $self->[6] >= $self->[4] - 1;
$self->[1][++$self->[6] % $self->[3]]
}
sub report {
my $self = shift;
my $f = $self->[6] - $self->[2];
my $l = $f + $self->[3];
$f = 0 if $f < 0;
$l = $self->[4]-1 if $l >= $self->[4];
grep defined, map $self->[1][ $_ % $self->[3] ], $f..$l;
}

package main;

tie *FH, 'TH', shift, 3, 3;
/\b\w\b/ and print "vvvvvv\n",(tied *FH)->report, "^^^^^^\n" while <FH>;

Tom Christiansen

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

[courtesy cc of this posting mailed to cited author]

In comp.lang.perl.moderated, bl...@my-deja.com writes:
:system("/gnu/egrep -B linesbefore whatever");
:system("/gnu/egrep -A linesafter whatever");
:system("/gnu/egrep -B linesbefore -A linesafter whatever");

That doesn't do anything to teach people algorithms, nor is
it applicable in non-grep situations.

--tom
--
"The computer programmer is a creator of universes for which he alone is
responsible. Universes of virtually unlimited complexity can be created in
the form of computer programs." --Joseph Weizenbaum

Uri Guttman

unread,

Sep 17, 1999, 3:00:00 AM9/17/99

to

>>>>> "DB" == Don Blaheta <d...@cs.brown.edu> writes:

DB> True enough, but easy to fix; after the while loop put a continue block
DB> that resets the context counters. For those keeping track, that makes
DB> my patba code now

DB> while (<>) {
DB> shift @forelines if @forelines > $fore;
DB> push @forelines, $_ unless $aftleft;
DB> print and $aftleft-- if $aftleft;
DB> print splice @forelines, 0 and $aftleft = $aft if /$pat/o;
DB> } continue {
DB> @forelines = (), $aftleft = 0 if eof;
DB> }

what is the purpose of the continue block? you never do anything to
execute it like using next. you could just put that code in the loop
body.

then you could wrap the entire block in a -n loop and save the while
statement. of course, then your init stuff has to be in a BEGIN block.

lt lindley

unread,

Sep 18, 1999, 3:00:00 AM9/18/99

to

In a fit of machoistic tenacity, I force-fit a kludge onto the regular
expression so that it can appear to reap the bonus points. I tested
this a bit and it passes my tests.

Even though it doesn't meet Tom's other contest critera, I hope
it will be included in the inevitable "benchmark" comparisons
and other "completeness" torture tests.

Also note that this solution is the only one I have seen so far that
will match a pattern that crosses line boundaries.

#!/usr/lib/lprgs/perl -w
use strict;

# Print a pattern and up to N lines of text before the pattern.
# and up to N lines after the pattern, including continuation
# of the lines printed if pattern appears in the "after" lines

# Assumptions:
# 1) the pattern can match across a line boundary!
# 2) Memory is cheap. This is the barbarian "all memory should be free!"
# entry. Not some namby pamby space conservation algorithm.
# Seriously, memory is way too cheap to worry about like we used to do.
#

# Interesting code moved to the top. Not normally how I would have
# layed this out.

my (%opts, $USAGE);
use vars qw/$continue_match/;

my $pattern = parse_args();

# Should I assume that when the user gives a space char that they
# really mean \s and that it includes newline? Assume so unless
# they are sophisticated enough to tell me to leave their sacred pattern
# alone. Otherwise, I'm going to expect unsophisticated users who
# provide the string "black ice" and expect it to match when one line
# ends with "black" and the next line starts with "ice". Therefore,
# I mangle the user supplied pattern by default.

$pattern =~ s/ +/\\s+/g unless exists $opts{S};

use re 'eval';
#$pattern = qr{
# (
# (?:.*\n){0,$opts{B}} # Match up to -B arbitrary lines
# (?m: # ^ and $ match on newline for this part
# ^.* # 0 or more chars begining of this line
# (?s-x:$pattern) # Whatever they want to match
# # where . and \s match newline too
# # but space has meaning within $pattern
# #HEY! Fix perlre to say that \s matches
# #newline just like . does under /s!!!!
# #That is expected, but not obvious!!!!
# .*$ # Match to end of line containing $pattern
# )
# \n # Throw in final newline of line containing
# # pattern -- m modifier dropped it.
# # Give trailing lines that don't contain pattern
# (?m:
# (?!^.*(?s-x:$pattern).*$)
# .*\n
# ){0,$opts{A}} # Well, up to -A lines anyway
# ) # End of what is returned in $1
#
# # Now set a flag if $pattern appears in the next line
# (?(?= # If pos lookahead is true
# (?m:
# ^.*(?s-x:$pattern).*$
# )
# )
# (?{$continue_match = 1}) # Then set this flag for the printing function
# )
#}xo;

# Well, there seems to be a bug with (?s-x:), so I must get rid
# of all the readability drek. Good luck reading it. Or
# just take my work that is the same as above except for the "-x" stuff.
$pattern = qr{((?:.*\n){0,$opts{B}}(?m:^.*(?s:$pattern).*$)\n(?m:(?!^.*(?s:$pattern).*$).*\n){0,$opts{A}})(?(?=(?m:^.*(?s:$pattern).*$))(?{$continue_match=1}))}o;

sub find_pattern_in_file {
my ($fh, $file_name) = @_;
print "\n\n$file_name --\n" if (defined $file_name);
# Screw memory requirements. Memory is cheap and modern books aren't
# that long.
local $/;
my $string = <$fh>;

while ($string =~ /$pattern/g) {
print $1;
# If at this point in the matching, the next
# line (or lines if the user pattern can cross line
# boundaries) contains $pattern,
# then don't print a separator. Just let it continue
# printing the lines together.
# The $continue_match var is set in the regexp
$continue_match ? $continue_match = 0 : print "\n--\n" ;

# Bug. Get "\n--\n" after last match.
}
}

# The rest of this just handles the argument processing and
# file manipulation.
if (@ARGV == 0) {
if (exists $opts{D}) {
# Specail test case
find_pattern_in_file(\*DATA);
} else {
# No file name provided as an argument
find_pattern_in_file(\*STDIN);
}
} else {
foreach my $file_name (@ARGV) {
# Rather than simply using magic <ARGV> processing,
# I want to print the file name.
local *FH;
open(FH, $file_name)
or die "Failed to open $file_name: $!\n\t$USAGE";
find_pattern_in_file(\*FH, $file_name);
}
}
exit 0;

sub parse_args {
$USAGE = "patfore [-A X] [-B T] [-C N] [-S] pattern [files ...| -D]
\twhere T is number of lines before pattern to display and

\twhere X is number of lines after pattern to display and

\twhere N is number of lines both before and after pattern to display
\t-S indicates a strict pattern that should not be white space pampered
\tand -D says use the testing data at the bottom of the source";

use Getopt::Std;

die $USAGE unless getopts('DSA:B:C:', \%opts);
$opts{B} = 0 unless exists $opts{B}; # Number of lines before pat to print
$opts{A} = 0 unless exists $opts{A}; # Number of lines after pat to print
$opts{B} = $opts{A} = $opts{C} if (exists $opts{C}); # Overrides both A and B
# There must be at leat one pattern arg and at least A or B must be set

die $USAGE unless (@ARGV >= 1 && ($opts{A} > 0 || $opts{B} > 0));
shift @ARGV; # return pattern
}
__END__
This is line 1
and line 2
and line 3
and yet a line 4
and a line 5

and 6 don't you know MATCH
and a line 7
and 8 was the last. MATCH
but now there are 9
and yet still more is 10
insert some more here
because I needed more room
to match before and after
and too see if it really works. MATCH

Steven Tolkin

unread,

Sep 18, 1999, 3:00:00 AM9/18/99

to

The contest is fun, produces a lot of creative output, and solves a real
need (an improved version of the old cgrep (context grep) from the first
Camel that I still use). But I am concerned that it still seems to
emphasize the mentality of cleverness (which seems to cause obscurity).

Tom Christiansen (who will get this as email too) wrote:
....

> Solutions will be judged upon these overall criteria:

....
> Subjective:

>
> * conciseness: keep it short. no frills. just a few lines.

> * clarity: is this understandable to a native perl speaker?

> * idiom: does this look like natural perl? does it use cool

> perl features absent in other languages?

> * creativity: is your solution cleverly different from those

> of others?

I suggest that these subjective criteria miss a few that are more
important:

* extensibility: How easy is it to add functionality, or add robustness.
Adding robustness might include complete error checking of the options,
e.g for non-negative integers, warning if the file does not seem to be
Text, etc. Extensibility in this case might consist of accepting a list
of patterns to match and having each have an associated before and after
count. Or simply accepting many of the grep options, e.g. -i for case
insensitive match, or -v for inverting the sense of the match, etc.

* reusability: Does the code reuse existing modules? Presumably these
are "battle tested". Does the code contribute a new module that could
be reused by others?

On the topic of clarity I would like to suggest that there are ways to
assess this in a quasi-objective manner. The technique of bebugging
(sic) consists of deliberately distorting the program by adding,
removing, or moving a syntactic construct: character, token, line, etc.
and then testing how long it takes a programmer to make a repair.
This is a surrogate measure for the main pragmatic goal of clarity:
increasing the ability of competent Perl programmer to find and
correctly fix a bug, or add a feature.

A main benefit of using idioms is that it can sometimes contribute to
clarity. Idioms can also contribute to performance. But too often
idioms cause perplexity.

What am I belaboring the obvious? Because I believe the Perl community
must try hard to overcome Perl's reputation of being hard to
understand. APL is the protoype of a language that encouraged
"one-liners" and look at its fate. (OK, the use of a "creative"
character set also played a role.)

I love Perl; it is the highest level language available for solving
practical programming problems, is truly portable, etc. But "image"
*is* important in the real world. Perl is not being used in some
projects due to its reputation. That is a fact. We should be
emphasizing Perl's suitability for quickly writing programs
that are correct and efficient.

Hopefully helpfully yours,
Steve
--
Steven Tolkin tol...@mediaone.net work: steve....@fmr.com
Fidelity Investments 82 Devonshire St. R24D Boston MA 02109
617-563-0516
There is nothing so practical as a good theory. Comments are by me,
not Fidelity Investments, its subsidiaries or affiliates.

Chris Nandor

unread,

Sep 18, 1999, 3:00:00 AM9/18/99

to

In article <37E3AFC7...@mediaone.net>, Steven Tolkin
<tol...@mediaone.net> wrote:

# The contest is fun, produces a lot of creative output, and solves a real
# need (an improved version of the old cgrep (context grep) from the first
# Camel that I still use). But I am concerned that it still seems to
# emphasize the mentality of cleverness (which seems to cause obscurity).

Why is that bad? It is an exercise. Consider it research.

# I suggest that these subjective criteria miss a few that are more
# important:

I disagree.

# * extensibility: How easy is it to add functionality, or add robustness.

This is not an exercise to write complete programs for distribution to the
world to use. Note that Tom did not even ask for documentation, which is
essential for finished code for such distribution. Adding to it is not
important here.

# * reusability: Does the code reuse existing modules? Presumably these

See above.

# What am I belaboring the obvious? Because I believe the Perl community
# must try hard to overcome Perl's reputation of being hard to
# understand.

I don't. I think we shouldn't give a damn. I sure don't. Of course,
many here will disagree with me. But I don't give a damn. :)

# I love Perl; it is the highest level language available for solving
# practical programming problems, is truly portable, etc. But "image"
# *is* important in the real world.

I am in the real world, and it does not matter to me. Are you saying that
I am not in the real world, or that it does matter to me, and I am lying
to myself (or to you)? :)

# Perl is not being used in some
# projects due to its reputation. That is a fact.

It is also a fact with every other language in existence, from C to Python
to AppleScript to Ada. In light of that, I consider your fact
irrelevant. Also, consider that Perl is easily one of the top five
languages in demand these days (most likely in the top three with Java and
C, and apparently gaining in popularity, still), and I just don't see
cause for worry.

Of course, it is your right to worry about such things. But it is my
right to respond. :)

# We should be
# emphasizing Perl's suitability for quickly writing programs
# that are correct and efficient.

We should be emphasizing whatever we feel like emphasizing at the moment
we emphasize it. This newsgroup is not comp.lang.perl.advocacy.

--
Chris Nandor mailto:pu...@pobox.com http://pudge.net/
%PGPKey = ('B76E72AD', [1024, '0824090B CE73CA10 1FF77F13 8180B6B6'])

Richard Proctor

unread,

Sep 18, 1999, 3:00:00 AM9/18/99

to

Ok, here is my offering...

Simple, does all 3 jobs before/after/span handles overlapping ranges
has a few frills (Highlighting the matched line, and line numbering - delete
if not wanted)

#!perl
# patfind
# written by Richard Proctor

use Getopt::Std;

sub usage {print <<END; exit}
$0 [-m] [-N] [-B lines] [-A lines] [-C lines] /pattern/ [files...]

-m Print indication of matched line
-N Print Line Numbers
-B n Print n lines before the match
-A n Print n lines after the match
-C n Print n lines before and after the match

Slashes on the pattern are optional.
END

getopts('mNB:A:C:') and $pat = shift and @ARGV or usage;
$opt_A = $opt_B = $opt_C if $opt_C;
$pat =~ s!^/(.*)/$!$1!;

while (<>) {
$_ = "$.\t$_" if $opt_N;
if (/$pat/) {
print "\n";
print @Past;
@Past=();
print ">>>>\t" if $opt_m;
print;
$ToDo = $opt_A;
}
elsif ($ToDo) {
$ToDo--;
print;
}
elsif ($opt_B) {
push @Past,$_;
shift @Past if ($#Past == $opt_B);
}
}

--
Ric...@waveney.demon.co.uk

Jamie McCarthy

unread,

Sep 18, 1999, 3:00:00 AM9/18/99

to

Chris Nandor wrote:

> # I am concerned that it still seems to

> # emphasize the mentality of cleverness (which seems to cause obscurity).
>
> Why is that bad? It is an exercise. Consider it research.

OK, here's another answer then:

@l=<>;for$i(0..$#l){map{$m{$_}++}($i-$b..$i+$a)if$l[$i]=~/$r/o}print@l[grep{$m{$_}}(0..$#l)]

It's horrible for space (processes the whole input before beginning
to print) and not terrible for time. Here's the full script,
including initialization (214 bytes long, 211 if you lose the "-w"):

#!/usr/bin/perl -w
while(@ARGV&&$ARGV[0]=~/^-([ABC])$/){$1ne'B'and$a=$ARGV[1];$1ne'A'and$b=$ARGV[1];shift;shift}$r=shift;@l=<>;for$i(0..$#l){map{$m{$_}++}($i-$b..$i+$a)if$l[$i]=~/$r/o}print@l[grep{$m{$_}}(0..$#l)]

Sample runs:

$ ./patba4.pl -B 1 -A 3 perl /usr/dict/words
improper
improperly
impropriety
improve
improved
proper
properly
properness
propertied
properties
superiors
superlative
superlatively
superlatives
supermarket
supermarkets
supermini

$ ./patba4.pl -C 2 '^Aa|perl' /usr/dict/words
Aarhus
Aaron
Ababa
aback
impromptu
improper
improperly
impropriety
improve
propensity
proper
properly
properness
propertied
superiority
superiors
superlative
superlatively
superlatives
supermarket
supermarkets

--
Jamie McCarthy

Uri Guttman

unread,

Sep 18, 1999, 3:00:00 AM9/18/99

to

>>>>> "RP" == Richard Proctor <Ric...@waveney.demon.co.uk> writes:

RP> Simple, does all 3 jobs before/after/span handles overlapping
RP> ranges has a few frills (Highlighting the matched line, and line
RP> numbering - delete if not wanted)

RP> while (<>) {
RP> $_ = "$.\t$_" if $opt_N;
RP> if (/$pat/) {
RP> print "\n";
RP> print @Past;
RP> @Past=();
RP> print ">>>>\t" if $opt_m;
RP> print;
RP> $ToDo = $opt_A;
RP> }
RP> elsif ($ToDo) {
RP> $ToDo--;
RP> print;
RP> }
RP> elsif ($opt_B) {
RP> push @Past,$_;
RP> shift @Past if ($#Past == $opt_B);
RP> }
RP> }

this is broken in several ways. it does not handle overlaps as you claim
and it has the same bugs as other solutions with late matches and
multiple files.

on this input:

abc
def
abghi
jkl
mno
pqr
stu

and this command:

perl grep1.pl -A 2 ab grep.dat

i get this output:

abc
def

abghi
jkl
mno

which is not what tom asked for regarding overlaps. it should output

abc
def
abghi
jkl
mno

so sorry, try again. merging the overlap and non-overlap logic is not
trivial. i am close to a very elegant solution. but i will test it to
death before i publish it here.

lt lindley

unread,

Sep 19, 1999, 3:00:00 AM9/19/99

to

In comp.lang.perl.misc Uri Guttman <u...@sysarch.com> wrote:
:> ... but i will test it to

:>death before i publish it here.

Chicken. :-)

I would be grateful if you would publish your test suite. That
is of almost as much interest as your solution.

[[I don't post in c.l.p.moderated. Followups set]]

Uri Guttman

unread,

Sep 19, 1999, 3:00:00 AM9/19/99

to

>>>>> "ll" == lt lindley <l...@rgsun40.viasystems.com> writes:

ll> In comp.lang.perl.misc Uri Guttman <u...@sysarch.com> wrote:
ll> :> ... but i will test it to
ll> :>death before i publish it here.

ll> Chicken. :-)

your father was a hamster and your mother smells of elderberries.

ll> I would be grateful if you would publish your test suite. That
ll> is of almost as much interest as your solution.

no test suite. i just understnad the problem space very well and look
for all the boundary conditions that are missed by many. several of the
'solutions' to this competition fell apart just by my giving their code
a good look at how they handle boundaries. then a simple test case will
exploit the bug and expose it.

as i said, the combined version which supports overlapped ranges and
plain ranges is not simple. there are many different combination of
states to keep track of. i am working on simplifying the logic to reduce
the special cases and showcase the commonality of the two grep variations
more than their differences. this is what i gather tom wants to see and
i will deliver. too bad he has me killfiled right now.

just to illustrate my design process, i have written the scaffold of the
script with ARGV processing, a usage sub, a print sub (which will handle
various options when printing a match range, and some simple common code
in the main loop. the loop is actually a -n loop as well. but the main
logic to handle the grep ranges is only in my head. i have been
pondering it for two days and i keep solving little boundary issues and
simplifying the logic. i am almost done with it an i may code it
tomorrow and test it out. i will create a couple of data files and grep
lines in all boundary cases, lines early or late in a file, various
values of before and after line counts, and all of those with/without
overlapping ranges enabled. this is how a clean and bug free design is
done, not by coding, debugging, and recoding until it seems to work. it
has to work in my head first and then in the program. it was my early
programming on punch cards and with long batch turnaround times (2+
hours sometimes) that trained me to use my head more and debuggers less.

Yitzchak Scott-Thoennes

unread,

Sep 19, 1999, 3:00:00 AM9/19/99

to

Posted and mailed.

In article <1dy873h.1oe...@imac.loc>, kpr...@ibm.net (Kevin Reid) wrote:
>foreach $IN (@ARGV) {
> open IN or die "Can't open $IN: $!";
>
> LINE: while (push(@lindex, tell IN), defined($_ = <IN>)) {
> shift @lindex if $#lindex > $opt_B;
> if (/$pat/o) {
> print "---\n" if $noncontiguous;
>
> seek(IN, $lindex[0], 0);
> for (my $i = 0; $i < $opt_A + @lindex; $i++) {

Needs some work in this loop. If the pattern recurs within $opt_A
lines, you won't catch it.

> my $y = <IN>; last if !defined $y;
> print(($opt_m ? ($i == $#lindex ? '+ ' : ' ') : '') . $y);
> }
> @lindex = ();
> $noncontiguous = 0;
> } else {$noncontiguous = 1}
> }

Better clear @lindex here, or you might use positions from one file to
seek in the next.

>}

Kragen Sitaker

unread,

Sep 19, 1999, 3:00:00 AM9/19/99

to

In article <37E3AFC7...@mediaone.net>,
Steven Tolkin <tol...@mediaone.net> wrote:
>* reusability: Does the code reuse existing modules? Presumably these

>are "battle tested". Does the code contribute a new module that could
>be reused by others?

Reusing existing modules is a stupid criterion. Reuse is a means, not
an end. It is a means to shorter, more reliable, faster programs.
Treating it as an end in itself will give us longer, less reliable,
slower programs.

Reusability is an impossible criterion; it does not inhere in the
reused code itself, but in the relationship between the code and its
reusers. It is impossible to evaluate reusability in the absence of
potential reusers.
--
<kra...@pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/>
Sun Sep 19 1999
50 days until the Internet stock bubble bursts on Monday, 1999-11-08.
<URL:http://www.pobox.com/~kragen/bubble.html>

Richard Proctor

unread,

Sep 19, 1999, 3:00:00 AM9/19/99

to

In article <x7hfkra...@home.sysarch.com>, Uri Guttman

<URL:mailto:u...@sysarch.com> wrote:
> >>>>> "RP" == Richard Proctor <Ric...@waveney.demon.co.uk> writes:
>
> RP> Simple, does all 3 jobs before/after/span handles overlapping
> RP> ranges has a few frills (Highlighting the matched line, and line
> RP> numbering - delete if not wanted)
>
>

> this is broken in several ways. it does not handle overlaps as you claim
> and it has the same bugs as other solutions with late matches and
> multiple files.

Ok I accept their was a bug on late matches on multiple files - fixed.

>
> which is not what tom asked for regarding overlaps. it should output
>

On the overlap case I think it is a case of interpretation, but the modified
version behaves as you describe.

> so sorry, try again. merging the overlap and non-overlap logic is not

> trivial. i am close to a very elegant solution. but i will test it to

> death before i publish it here.

Here is published take 2, it resets info between files, only throws the
newline if there is a gap in the output, and prints file name before the
first match in a file, if there are multiple files :

#!perl
# patfind
# written by Richard Proctor

use Getopt::Std;

sub usage {print <<END; exit}
$0 [-m] [-N] [-B lines] [-A lines] [-C lines] /pattern/ [files...]

-m Print indication of matched line
-N Print Line Numbers
-B n Print n lines before the match
-A n Print n lines after the match
-C n Print n lines before and after the match

Slashes on the pattern are optional.
END

getopts('mNB:A:C:') and $pat = shift and @ARGV or usage;
$opt_A = $opt_B = $opt_C if $opt_C;
$pat =~ s!^/(.*)/$!$1!;

foreach $File (@ARGV) {
@Past = ();
$NamePrinted = $ToDo = $Gap = 0;
open (IN,$File) || die "$File would not open $!";
while (<IN>) {

$_ = "$.\t$_" if $opt_N;

if (/$pat/) {
print "\n$File:\n" unless ($NamePrinted++ && $#ARGV);
print "\n" if $Gap;
print @Past;
@Past=();
$Gap = 0;

print ">>>>\t" if $opt_m;
print;
$ToDo = $opt_A;
}
elsif ($ToDo) {
$ToDo--;
print;
}
elsif ($opt_B) {
push @Past,$_;

$Gap = 1, shift @Past if ($#Past == $opt_B);
}
else { $Gap = 1 };
}
close (IN);
}

--

Ric...@waveney.demon.co.uk

Yitzchak Scott-Thoennes

unread,

Sep 19, 1999, 3:00:00 AM9/19/99

to

Posted and mailed.

In article <slrn7u2pe...@pulsar.halcyon.com>,

k...@halcyon.com (Ken Pizzini) wrote:
>> patfore [-B N] pattern [files ...]
>

> push @lines, $_;
> print splice(@lines, 0, $N) if /pattern/;
> shift @lines if @lines >= $N;

You are only printing $N-1 lines before a pattern match.
Try:
push @lines, $_;
print splice(@lines,0) if /pattern/;
shift @lines if @lines > $N;

In article <slrn7u3u6...@pulsar.halcyon.com>,

k...@halcyon.com (Ken Pizzini) wrote:
>On 16 Sep 1999 21:59:57 GMT, Ken Pizzini <k...@halcyon.com> wrote:
>> push @lines, $_;
>> if (($matched = /pattern/) || $. <= $end) {
>> print @lines;
>> @lines = ();
>> $end = $matched && $. + $X;
>> } else {
>> shift @lines if !$end && @lines >= $T;

Again, this should be @lines > $T, shoudn't it?

>
>Oops --- last minute edit broke this script; this line should read:
> shift @lines if $. > $end && @lines >= $T;
>
>> }

But in the else, '$. > $end' is always true.

You can also do away with $matched by setting $end in the if.
Combining this with options stuff shamelessly stolen from Kevin Reid, I get:

#!perl -wn
use strict;
use vars qw/$opt_A $opt_B $opt_C $end $pat @lines/;
use Getopt::Std;

BEGIN {
getopts('C:B:A:') and $pat = shift or usage();
foreach ($opt_A, $opt_B) {$_ ||= $opt_C || 0 unless defined $_; }
$pat =~ s#^/(.*)/$#$1#;
print $opt_A, $opt_B, "\n";
}

push @lines, $_;
if (/$pat/o and $end = $. + $opt_A or $end and $. <= $end) {
print splice(@lines,0);
} else {
shift @lines if @lines > $opt_B;
}

$end = @lines = () if eof;

sub usage {print <<"END" and exit 1}
$0 [-B lines] [-A lines] [-C lines] /pattern/ [files...]

Peter G. Martin

unread,

Sep 20, 1999, 3:00:00 AM9/20/99

to

Uri Guttman wrote:

> so sorry, try again. merging the overlap and non-overlap logic is not
> trivial. i am close to a very elegant solution. but i will test it to
> death before i publish it here.
>

Hey, don't hold back too long : last time someone said something
like that, it took about 300 years to find a margin big enough to
fit it into... :=)

--
--peterm Peter G. Martin -- Tech. Writer
http://www.zeta.org.au/~peterm +61 2 9818 5094
Mobile: 0408 249 113 pet...@zeta.org.au

Joe Smith

unread,

Sep 20, 1999, 3:00:00 AM9/20/99

to

In article <ant19084...@waveney.demon.co.uk>,
Richard Proctor <Ric...@waveney.demon.co.uk> wrote:
>getopts('mNB:A:C:') and $pat = shift and @ARGV or usage;

> $_ = "$.\t$_" if $opt_N;

Your solution fails on this test:

zcat http_access_log.gz | ./perlpat -N -C 2 404

1) Since no file names are specified on the command line, it is supposed
to read STDIN and not print a usage message.
2) I'm looking for lines with "404 not found", not line 404 of the input.

-Joe
--
INWAP.COM is Joe Smith, Sally Smith and our cat Murdock.
(The O'Hallorans and their cats moved to http://www.tyedye.org/ Nov-98.)
See http://www.inwap.com/ for PDP-10, "ReBoot", "Shadow Raiders"/"War Planets"

lt lindley

unread,

Sep 20, 1999, 3:00:00 AM9/20/99

to

Hmm. This is the 3rd time I've played with myself in the same
thread. Maybe I need to work on socialization. ;->

I've pushed the all-regexp solution as far as I care to take it.
There is a flaw when a line containing the end of one matching string
also contains the beginning of another matching string.

I might be able to use additional negative lookahead assertions to
prevent chewing up the end of a line that is needed to match the next
pattern, or maybe even chaining multiple m//g, but I think the
effort has crossed the threshold where what I learn from it can
exceed the cost. Ouch. I wish I hadn't had that last thought
about chaining multiple m//g; I'm starting to get another idea..

I learned quite a bit from the attempt so far though.

Invoke as follows to see advantages and flaws:
perl -w patba.pl -DC2 MATCH
perl -w patba.pl -DC2 'MATCH and'

#!/usr/lib/lprgs/perl -w
use strict;
# Author: Lee Lindley

# Print a pattern and up to N lines of text before the pattern, and
# up to N lines after the pattern. pattern can match across a line
# boundary at the cost of missing matching strings that begin on the
# same line as the end of a previous match.

my (%opts, $USAGE);
my $pre_pattern_markup = '*'; # Could be <b> ... </b>
my $post_pattern_markup = '*';
my $after_matched_lines = "--\n";

my $pattern = parse_args();

#my $full_pattern = qr{
# ((?:.*\n){0,$opts{B}}?) # Match up to -B arbitrary lines ($1)

# (?m: # ^ and $ match on newline for this part

# (^.*?) # 0 or more chars begining of this line($2)
# ((?s-x)$pattern)# Whatever they want to match ($3)

# # where . and \s match newline too
# # but space has meaning within $pattern

# (.*$) # Match to end of line containing $pattern($4)
# ) # which is where I could miss beginning of
# # another match. Oh well.

# \n # Throw in final newline of line containing
# # pattern -- m modifier dropped it.
# # Give trailing lines that don't contain pattern

# ((?m:

# (?!^.*(?s-x:$pattern).*$)
# .*\n

# ){0,$opts{A}}) # up to -A lines anyway ($5)
#}xo;

# Well, there seems to be a bug with (?-x:), so I must get rid

# of all the readability drek. Good luck reading it. Or

# just take my word that is the same as above except for the "-x" stuff.
my $full_pattern = qr{((?:.*\n){0,$opts{B}}?)(?m:(^.*?)((?s)$pattern)(.*$))\n((?m:(?!^.*(?s:$pattern).*$).*\n){0,$opts{A}})}o;

sub find_pattern_in_file {
my ($fh, $file_name) = @_;

print "\n\n$file_name\n", $after_matched_lines
if (defined $file_name);
local $/;
my $string = <$fh>; # suck it all in
while ($string =~ /$full_pattern/go) {
print $1 if defined $1; # lines before pat
print $2 if defined $2; # part of line before pat
print $pre_pattern_markup, $3, $post_pattern_markup;
print $4 if defined $4; # part of line after pat
print "\n";
print $5 if defined $5; # lines after pat
# If the next line contains the pattern, then
# don't print the string showing end of a match set
print $after_matched_lines
unless $string =~ /\G(?=.*(?s:$pattern).*$)/mgco;
}
}

# The rest of this just handles the argument processing and
# file manipulation.
if (@ARGV == 0) {
if (exists $opts{D}) {

# Special test case
find_pattern_in_file(\*DATA, "__DATA__");

} else {
# No file name provided as an argument
find_pattern_in_file(\*STDIN);
}
} else {
foreach my $file_name (@ARGV) {
# Rather than simply using magic <ARGV> processing,
# I want to print the file name.
local *FH;
open(FH, $file_name)
or die "Failed to open $file_name: $!\n\t$USAGE";
find_pattern_in_file(\*FH, $file_name);
}
}
exit 0;

sub parse_args {
# I write strings this way because I used to write them this
# way in C (well with \n\ at the ends of the lines)
# and I just can't seem to kick the habit. :-)
$USAGE = "patfore [-B T] [-A X] [-C N] [-S] [-D] pattern [files ...]

\twhere T is number of lines before pattern to display and
\twhere X is number of lines after pattern to display and
\twhere N is number of lines both before and after pattern to display
\t-S indicates a strict pattern that should not be white space pampered
\tand -D says use the testing data at the bottom of the source";

use Getopt::Std;

die $USAGE unless getopts('DSA:B:C:', \%opts);
$opts{B} = 0 unless exists $opts{B}; # Number of lines before pat to print
$opts{A} = 0 unless exists $opts{A}; # Number of lines after pat to print
$opts{B} = $opts{A} = $opts{C} if (exists $opts{C}); # Overrides both A and B
# There must be at leat one pattern arg and at least A or B must be set

# If they want to give me a negative number or a float or
# something else that isn't a number, then let them burn for now.

die $USAGE unless (@ARGV >= 1 && ($opts{A} > 0 || $opts{B} > 0));

my $pattern = shift @ARGV;

# Should I assume that when the user gives a space char that they
# really mean \s and that it includes newline? Assume so unless
# they are sophisticated enough to tell me to leave their sacred pattern
# alone. Otherwise, I'm going to expect unsophisticated users who
# provide the string "black ice" and expect it to match when one line
# ends with "black" and the next line starts with "ice". Therefore,
# I mangle the user supplied pattern by default.
$pattern =~ s/ +/\\s+/g unless exists $opts{S};

return $pattern;
}

__END__
This is line 1 MATCH

and line 2
and line 3
and yet a line 4
and a line 5
and 6 don't you know MATCH

and a line 7 MATCH MATCH

Don Blaheta

unread,

Sep 20, 1999, 3:00:00 AM9/20/99

to

Quoth Uri Guttman:

> >>>>> "DB" == Don Blaheta <d...@cs.brown.edu> writes:
> DB> } continue {
> DB> @forelines = (), $aftleft = 0 if eof;
> DB> }
>
> what is the purpose of the continue block? you never do anything to
> execute it like using next. you could just put that code in the loop
> body.

*smacks forehead* That's what I get from cribbing too directly from the
examples in the docs. ;)

--
-=-Don Blaheta-=-=-d...@cs.brown.edu-=-=-<http://www.cs.brown.edu/~dpb/>-=-

I haven't lost my mind -- it's backed up on tape somewhere.

Chris Nandor

unread,

Sep 20, 1999, 3:00:00 AM9/20/99

to

In article <37E41015...@mccarthy.org>, Jamie McCarthy
<ja...@mccarthy.org> wrote:

# Chris Nandor wrote:
#
# > # I am concerned that it still seems to
# > # emphasize the mentality of cleverness (which seems to cause obscurity).
# >
# > Why is that bad? It is an exercise. Consider it research.
#
# OK, here's another answer then:

To quote Spinal Tap, there's a fine line between stupid and clever. :)

Neko

unread,

Sep 20, 1999, 3:00:00 AM9/20/99

to

On 16 Sep 1999 08:52:46 -0700, Tom Christiansen <tch...@mox.perl.com> wrote:

> 3) Write a "patba" program that prints out up to
> X lines before the match and Y lines after the match.
>
> patba [-A X] [-B T] pattern [files ...]
>
> or both N before and after:
>
> patba [-C N] pattern [files ...]
>

>For Extra Credit:

> Provide alternate solutions that also coalesce with overlapping

> ranges. For example, if you ask for 2 lines before and after, and

> lines 2, 4, and 5 all contain matches, the output should comprise

> lines [1-7] rather three separate output blocks showing lines.

> [1-4], [2-6], and [3-7].

Below is my latest incarnation of 'patba'. The -O switch turns on merging of
overlapping ranges. I did not include a 'patfore' or 'pataft' this time
since 'patba' is a superset (I can selectively delete from 'patba' to get the
other two if necessary). It correctly handles matches (if there are 10 lines
and 10 matches, it returns 10 matches if no -O), overlaps, and file
boundaries.

It uses Abigail's "circular buffer" because I know a good idea when I see
one. I've used the scalar comma operator here more than I usually would -- I
get the feeling others will find it more obscure than clear (anyone have an
opinion on this?). There's also a goto() instead of if/else because this was
originally two programs that got pushed together -- I also think the goto()
labels add clarity that if/else wouldn't.

#!/usr/bin/perl -w
#
# patba [ [ [-A X] [-B Y] ] | [-C N] ] [ -O ] pattern [files ...]

use strict;

my ($a, $b); # Number of lines to print after and before each match.
my $o; # Merge overlaps.
my $c = 0; # Current state: + => after, - => before, 0 => normal
my $i = 0; # Count number of matches.

while (@ARGV && $ARGV[0] =~ /-([ABCO])/) {
$1 eq 'O' and $o = shift, next;
$1 eq 'A' ? ($a) :
$1 eq 'B' ? ($b) :
($a, $b) = (shift) x 2 for shift;
}
defined && /^\d+$/ or $_ = 3 for $a, $b;

@ARGV or die <<USAGE;
Usage: patba [ [ [-A X] [-B Y] ] | [-C N] ] [ -O ] pattern [files ...]
-A X print X lines after match
-B Y print Y lines before match
-C N print N lines before and after match
-O merge overlapping matches together
USAGE

my $re = shift; # Pattern to match against.
my @lines = ('') x $b; # Store last $b lines.

goto $o ? 'OVERLAP' : 'SEPARATE';

OVERLAP:

while (<>) {
if (/$re/o) {
print '==== ', ++$i, ' ====', "\n" unless $c;
print @lines[$. % $b .. $b - 1, 0 .. $. % $b - 1] if $b;
print, $c = $a, next;
}
$c = --$c || -$b, print, next if $c > 0;
@lines = ('') x $b if $c < 0 && $c++ == $b;
$lines[$. % $b] = $_ if $b;
} continue {
$c = 0, @lines = ('') x $b, close ARGV if eof;
}

exit;

SEPARATE:

my $n = 0; # Like $. -- current line number.
my @cache; # Lines pushed back into <>.
my @freeze_lines; # Remember @lines.
my @freeze_cache; # Remember lines for new @cache.

while (defined($_ = @cache ? shift @cache : <>)) {
$n++;
unless ($c) {
/$re/o or next;
print "==== ", ++$i, " ====\n";
print @lines[$n % $b .. $b - 1, 0 .. $n % $b - 1] if $b;
print, $c = $a, next;
}
print;
if (@freeze_cache) {
push @freeze_cache, $_;
} else {
@freeze_cache = $_, @freeze_lines = @lines if /$re/o;
}
unless (--$c) {
@cache = splice @freeze_cache, 0 or next;
@lines = @freeze_lines, $n -= @cache, $_ = shift @cache, redo;
}
} continue {
$lines[$n % $b] = $_ if $b;
$c = 0, @lines = ('') x $b, close ARGV if eof;
}

exit;

__END__
--
Neko | t...@chocobo.org | Will hack Perl for a moogle stuffy! =^.^=

Yitzchak Scott-Thoennes

unread,

Sep 20, 1999, 3:00:00 AM9/20/99

to

In article <r+R53gzk...@efn.org>,

stho...@efn.org (Yitzchak Scott-Thoennes) wrote:
>Combining this with options stuff shamelessly stolen from Kevin Reid, I get:

Please disregard the earlier posting of a test version.
Here's what I meant to post:

#!perl -wn
use strict;
use vars qw/$opt_A $opt_B $opt_C $end $pat @lines/;
use Getopt::Std;

BEGIN {
getopts('C:B:A:') and $pat = shift or usage();
foreach ($opt_A, $opt_B) {$_ ||= $opt_C || 0 unless defined $_; }
$pat =~ s#^/(.*)/$#$1#;

sub usage {print <<"END" and exit 1}

$0 [-B lines] [-A lines] [-C lines] /pattern/ [files...]

-B n Print n lines before the match
-A n Print n lines after the match
-C n Print n lines before and after the match

Slashes on the pattern are optional.
END

}

if (/$pat/o and $end = $. + $opt_A or $end and $. <= $end) {
print splice(@lines,0), $_;
} else {
push @lines, $_;

lt lindley

unread,

Sep 21, 1999, 3:00:00 AM9/21/99

to

Perl regular expressions (and extensions) share some traits with
certain classes of mind altering substances. As you increase the
doseage, you get some conciousness expanding affects. rexen can be
lovely. As you dig deeper, it seems like you can do almost
anything. But then the dark side starts to show. You spend more and
more time exploring the possibilities. You try to do even more with
them. The cycle builds. And soon you are lost. You alternate
between furiously banging away at the keyboard to set up a new trial
of your latest vision, then staring vacantly at the screen as you try
to tweak the model in your head to dance to the rexen rules.

Beware the rexen. Just say no.

This (hopefully last, but I'm not promising) version will match
across lines. It will place markup text (currently just a '*')
before and after the matched text. It can do this even if the
beginning of the next match occurs on the same line as the end
of the previous match.

Invoke as follows to see some advantages:

perl -w patba.pl -DC2 MATCH
perl -w patba.pl -DC2 'MATCH and'

perl -w patba.pl -DC2 'MATCH.*?9'

#!/usr/lib/lprgs/perl -w
use strict;
# Author: Lee.L...@bigfoot.com

#use re 'debug';

# Print a pattern and up to N lines of text before the pattern, and
# up to N lines after the pattern. pattern can match across a line

# boundary. Mark the matched text in the output.

my (%opts, $USAGE);

# Strings used to prettify the output

my $pre_pattern_markup = '*'; # Could be <b> ... </b>
my $post_pattern_markup = '*';
my $after_matched_lines = "--\n";

my $pattern = parse_args(); # drudgery hidden for now

# Build a pattern that will find a match and capture it along
# with the desired lines that proceed the match
my $full_pattern = qr{
( # Capture "pre" lines in $1
(?:^ # Bind to beginning of line
# cuts down on backtracking? seems to help
# The captured line cannot contain $pattern
# or actually, the start of $pattern since
(?!.*(?s:$pattern)) # the ?s: lets $pattern cross line boundary
.*\n # Actual stuff captured in $1
){0,$opts{B}} # But only up to -B lines
) # End of "pre" lines captured in $1

# A ? minimal modifier seemed more natural like {0,$opts{B}}?,
# but negative lookahead assertion turned out to be more efficient.
# That might not be the case for really long lines, but it was
# in my testing.

( # Capture part of line before $pattern in $2
^.*? # Either the min set of chars prior to a
| # matching string beginning at start of line
\G.*? # Or same thing starting where matching left off
# I couldn't see why the former was required
) # and tried various methods of priming \G.
# None worked.

# Now capture the actual matching string in $3
# Allow . and \s to match newline and make
# spaces count within $pattern. But beware
# that there is a bug with -x in 5.005_03,
# so no space or '#' can follow it for now
((?s-x)$pattern)}xmo;

# Build a pattern that MAY grab the end of the line containing the
# last match along with the desired lines that follow it, but without
# stepping on the potential next match. This one will be chained off of the
# prior $full_pattern.
my $after_pattern = qr{
\G # Bind to position last match left off
( # Capture it in $1 so can print it
(?: # Group so I can count them
(?!.*(?s-x:$pattern)) # I don't want to see $pattern in these
.*\n # What we actually capture
){0,$opts{A}} # Well, up to -A lines anyway
)
}xo;

sub find_pattern_in_file {
my ($fh, $file_name) = @_;
print "\n\n$file_name\n", $after_matched_lines if (defined $file_name);
local $/;

my $string = <$fh>; # Suck in the file. Hope we have enough
# memory and swap space.

while ($string =~ /$full_pattern/go) {
print $1 if defined $1; # lines before pat
print $2 if defined $2; # part of line before pat
print $pre_pattern_markup, $3, $post_pattern_markup;

# We have now printed the first part of a line that
# contains the pattern. There is always at a minimum
# a newline left over at this point. But there could
# also be the start of another pattern. If the start
# of another pattern occurs on this line, we let the next
# loop iteration print it instead
print $1 if $string =~ /$after_pattern/gco;
# The above may have printed the end of the line containing
# the pattern and up to -A lines after the pattern.
#
# If the next line (or the rest of this line) contains
# start of the next pattern, then don't print the string
# showing end of a match set of lines
print $after_matched_lines
if $string =~ /\G(?!.*(?s:$pattern))/mgco;
}

}

# The rest of this just handles the argument processing and
# file manipulation.
if (@ARGV == 0) {
if (exists $opts{D}) {
# Special test case
find_pattern_in_file(\*DATA, "__DATA__");
} else {
# No file name provided as an argument
find_pattern_in_file(\*STDIN);
}
} else {
foreach my $file_name (@ARGV) {
# Rather than simply using magic <ARGV> processing,
# I want to print the file name.
local *FH;
open(FH, $file_name)

or die "Failed to open $file_name: $!\n$USAGE";

find_pattern_in_file(\*FH, $file_name);
}
}
exit 0;

sub parse_args {

$USAGE = "patfore [-B T] [-A X] [-C N] [-S] [-D] pattern [files ...]
\twhere T is number of lines before pattern to display and
\twhere X is number of lines after pattern to display and
\twhere N is number of lines both before and after pattern to display
\t-S indicates a strict pattern that should not be white space pampered
\tand -D says use the testing data at the bottom of the source
";

use Getopt::Std;

die $USAGE unless getopts('DSA:B:C:', \%opts);

$opts{B} = 0 unless exists $opts{B}; # Num lines before pat to print
$opts{A} = 0 unless exists $opts{A}; # Num lines after pat to print

# Overrides both A and B
$opts{B} = $opts{A} = $opts{C} if (exists $opts{C});
# There must be at leat one pattern arg and at least A or B must be set

# If they want to give me something other than positive integer,
# then let them twist for now

die $USAGE unless (@ARGV >= 1 && ($opts{A} > 0 || $opts{B} > 0));
my $pattern = shift @ARGV;
# Should I assume that when the user gives a space char that they
# really mean \s and that it includes newline? Assume so unless
# they are sophisticated enough to tell me to leave their sacred pattern
# alone. Otherwise, I'm going to expect unsophisticated users who
# provide the string "black ice" and expect it to match when one line
# ends with "black" and the next line starts with "ice". Therefore,
# I mangle the user supplied pattern by default.
$pattern =~ s/ +/\\s+/g unless exists $opts{S};

$opts{A}++; # $after_pattern must match the end of the line that
# contains the pattern at a minimum. This is an
# awkward place to put this, but I don't see any better.

return $pattern;
}
__END__
This is line 1

and line 2
and line 3
and yet a line 4
and a line 5
and 6 don't you know MATCH

and a line 7 MATCH and MATCH

and 8 was the last.

Uri Guttman

unread,

Sep 21, 1999, 3:00:00 AM9/21/99

to

here is a bug:

grep.dat:

abc
def
abghi
jkl
mno
pqr
stu

perl grep1.pl -B 2 ab grep.dat grep.dat

grep.dat:
abc
def
abghi

it should print
abc

def
abghi

for both files. it works with one file:
> perl grep1.pl -B 2 ab grep.dat

grep.dat:
abc

grep.dat:
def
abghi

also your code is neither -w nor strict. and your @ARGV parsing is very
simplistic with no defaults or checking.

Kevin Reid

unread,

Sep 22, 1999, 3:00:00 AM9/22/99

to

Yitzchak Scott-Thoennes <stho...@efn.org> wrote:

> In article <1dy873h.1oe...@imac.loc>, kpr...@ibm.net (Kevin Reid)
> wrote:
> > for (my $i = 0; $i < $opt_A + @lindex; $i++) {
>
> Needs some work in this loop. If the pattern recurs within $opt_A
> lines, you won't catch it.

Uhh... in my tests, it handled it fine, showing the two matches in a
contiguous chunk. Try turning on -m.

> > }
> > @lindex = ();
> > $noncontiguous = 0;
> > } else {$noncontiguous = 1}
> > }
>
> Better clear @lindex here, or you might use positions from one file to
> seek in the next.

Oops.

--
Kevin Reid: | Macintosh:
"I'm me." | Think different.

Yitzchak Scott-Thoennes

unread,

Sep 23, 1999, 3:00:00 AM9/23/99

to

In article <1dydrpz.ae...@imac.loc>, kpr...@ibm.net (Kevin Reid) wrote:

>Yitzchak Scott-Thoennes <stho...@efn.org> wrote:
>>
>> Needs some work in this loop. If the pattern recurs within $opt_A
>> lines, you won't catch it.
>
>Uhh... in my tests, it handled it fine, showing the two matches in a
>contiguous chunk. Try turning on -m.

Hmmm. It failed for me:

[D:\]cat testdata
hello
there
this
is
a
test
looking
for
e[rs]

[D:\]perl kevinreid.pl -m -B 1 -A 4 /e[rs]/ testdata
---
hello
+ there
this
is
a
test

Test should be marked with a + and the remaining 3 lines in the file output.

Joe Seigh

unread,

Sep 29, 1999, 3:00:00 AM9/29/99

to

Here's my hack at it, the pathba part anyway. It's a sliding window
kind of approach.

#!/usr/local/bin/perl5
use Getopt::Std;
getopts("a:b:c:") || exit 1;
(($opt_a + 0) < 0 || ($opt_b + 0) < 0 || ($opt_c + 0) < 0)
&& die "-a, -b, or -c cannot be less than zero\n";
(($opt_a || $opt_b) && $opt_c) && die "-a or -b cannot be used with
-c\n";

$regexp = shift;
$match = eval "sub { $regexp; }"; # create closure on regexp;
$prefix = $opt_b + $opt_c + 0;
$suffix = $opt_a + $opt_c + 0;
@queue = ((undef) x $prefix); # initialize delay queue
$count = $prefix + $suffix + 1; # window size
$n = 0;

while (<>) {
$n = $count if &$match(); # start/slide window
push @queue, $_;
$_ = shift @queue;
print if ($n-- > 0);
}

# eof, drain delay queue
while ($n-- > 0) {
print shift @queue;
}

exit;

First time I've had to use a closure, regexp ranges have state
information.
I've been burned by that before. I assume that if you made it a
subroutine
with a block prototype parameter it would handle the regexp state
correctly,
but I haven't tried it out.

Joe Seigh

Tom Christiansen

unread,

Sep 29, 1999, 3:00:00 AM9/29/99

to

[courtesy cc of this posting mailed to cited author]

In comp.lang.perl.moderated,
Joe Seigh <jse...@bbnplanet.com> writes:
:First time I've had to use a closure, regexp ranges have state

:information.
:I've been burned by that before. I assume that if you made it a
:subroutine
:with a block prototype parameter it would handle the regexp state
:correctly,
:but I haven't tried it out.

You've done something wrong. And so has the moderator
who let thourgh this miswrapped posting. Please check
your newsreader.

--tom
--
When in doubt, mumble.

Jeff Yoak

unread,

Oct 1, 1999, 3:00:00 AM10/1/99

to

On 16 Sep 1999 08:52:46 -0700, Tom Christiansen <tch...@mox.perl.com>
wrote:

> 3) Write a "patba" program that prints out up to
> X lines before the match and Y lines after the match.
>
> patba [-A X] [-B T] pattern [files ...]
>

#!/usr/bin/perl -w

use strict;
use Getopt::Std;

use vars qw/$opt_A $opt_B/;
getopts('B:A:');
my $pat = shift;

my $range = $opt_A + $opt_B ;
my @stuff;

my $short;
for(my $i = 0; $i<=$range; $i++){
if(my $data = <>){
push @stuff, $data;
} else {
$short=1;
last;
}
}
for(my $i = 0; ($i<$opt_A && $i<=$#stuff); $i++){
my $upper = $#stuff < $i+$opt_B ? $#stuff : $i+$opt_B;
print @stuff[0..$upper] if $stuff[$i] =~ /$pat/;
}
if($short){
for(my $i = $opt_A; $i<=$#stuff; $i++){
print @stuff[$i-$opt_A..$#stuff] if $stuff[$i] =~
/$pat/;
}
exit;
}
while(<>){
print @stuff if $stuff[$opt_A] =~ /$pat/;
push @stuff, $_;
shift @stuff;
}
for(my $i = $opt_A; ($i<=$range && $i<=$#stuff); $i++){
my $upper = $#stuff < $range ? $#stuff : $range;
print @stuff[$i-$opt_A..$upper] if $stuff[$i] =~ /$pat/;
}

__END__

My thought was to go from here to something that doesn't have to do
all that shifting and pushing and just keep track of the most recently
added list item with modula division and use a lot more slices. That
might make it a bit faster. I'm also a little curious about handling
the actual pattern. I admit I've only tested mine of simple text
matches and haven't really thought through what would happen if you
tried something more complex. I noticed some people manipulated the
pattern that came in, but I couldn't see the reason for the things
done.

Thanks for the problem, Tom.

Cheers,
Jeff

lt lindley

unread,

Oct 1, 1999, 3:00:00 AM10/1/99

to

In comp.lang.perl.misc Jeff Yoak <je...@yoak.com> wrote:
:>My thought was to go from here to something that doesn't have to do

:>all that shifting and pushing and just keep track of the most recently
:>added list item with modula division and use a lot more slices. That
:>might make it a bit faster.

Abigail did that in her solution. It was cool. But I'm not sure that
it buys you all that much. perl doesn't shuffle the actual line data
around when you push and shift the array; it just shifts pointers
around and in a pretty smart way at that. The calculations required
for the indicies may eat up any savings. It would be an interesting
benchmark.

:> I'm also a little curious about handling

:>the actual pattern. I admit I've only tested mine of simple text
:>matches and haven't really thought through what would happen if you
:>tried something more complex. I noticed some people manipulated the
:>pattern that came in, but I couldn't see the reason for the things
:>done.

I think I was the only one to mangle the pattern other than those who
looked for enclosing / chars. I did it to allow spaces between words
to match across a line boundary. But that exceeded the stated problem
requirement and the resulting program was too slow to use in the general
case where you didn't need that capability (not to mention being a memory
pig).

:>Thanks for the problem, Tom.

It was an interesting problem and I was able to apply something I learned
to a rent paying problem this week. That is always cool.

I've never bothered to register in the moderated group, so followups
set to clpm only.

--
// Lee.Lindley /// I used to think that being right was everything.
// @bigfoot.com /// Then I matured into the realization that getting
//////////////////// along was more important. Except on usenet.