New feature to clump together lines

5 views
Skip to first unread message

Andy Lester

unread,
Mar 13, 2017, 1:08:11 PM3/13/17
to ack development
I have had a feature in the back of my head for a while.

Say I’m searching all my test files for ‘is()’ calls: "ack -w —perltest is"

74:    # Get the collection links.  There should be only two: One is IYAO, and the other is a personal collection.
77:    is( scalar @links, 2, 'Should have exactly two links... ' );
78:    is( $links[0]->attrs->{id}, 'at-tagdir-coll-iyao', '... the Items You Already Own link...' );
79:    is( $links[1]->attrs->{id}, "at-tagdir-coll-C$collectionid", '... and the one collection we have created' );
83:    is( scalar @tag_links, $ntags_expected, "Matched up $ntags_expected tags" );
91:    is( scalar @links, 2, 'Should have exactly two links... ' );
92:    is( $links[0]->attrs->{id}, 'at-tagdir-coll-iyao', '... the Items You Already Own link...' );
93:    is( $links[1]->attrs->{id}, "at-tagdir-coll-C$collectionid", '... and the one collection we have created' );


I want to add a flag so the output looks like this:

74:    # Get the collection links.  There should be only two: One is IYAO, and the other is a personal collection.

77:    is( scalar @links, 2, 'Should have exactly two links... ' );
78:    is( $links[0]->attrs->{id}, 'at-tagdir-coll-iyao', '... the Items You Already Own link...' );
79:    is( $links[1]->attrs->{id}, "at-tagdir-coll-C$collectionid", '... and the one collection we have created' );

83:    is( scalar @tag_links, $ntags_expected, "Matched up $ntags_expected tags" );

91:    is( scalar @links, 2, 'Should have exactly two links... ' );
92:    is( $links[0]->attrs->{id}, 'at-tagdir-coll-iyao', '... the Items You Already Own link...' );
93:    is( $links[1]->attrs->{id}, "at-tagdir-coll-C$collectionid", '... and the one collection we have created' );


I want to be able to easily see when the hits are on adjacent lines.

What should we call that?  “Cluster”?  “Clump”?  We already have —group for one of ack’s main features.


--
Andy Lester => www.petdance.com

Rob Hoelz

unread,
Mar 13, 2017, 1:15:26 PM3/13/17
to Andy Lester, ack development
I like "clump" myself.

Rather than add this to ack, would be perhaps be better served as a standalone script? I can think of other programs
that this would be useful with, and why bloat ack with more features if it can be its own thing?

Andy Lester

unread,
Mar 13, 2017, 2:31:43 PM3/13/17
to Rob Hoelz, ack development

> On Mar 13, 2017, at 12:15 PM, Rob Hoelz <r...@hoelz.ro> wrote:
>
> I like "clump" myself.
>
> Rather than add this to ack, would be perhaps be better served as a standalone script? I can think of other programs
> that this would be useful with, and why bloat ack with more features if it can be its own thing?

The idea of a standalone utility is appealing, but I’m not sure how it would be useful besides numbered greplike output. What else would you clump?

As to ack features, we’re already doing this sort of thing at the file level, so I don’t think it’s any worse.

Bill Ricker

unread,
Mar 13, 2017, 5:07:39 PM3/13/17
to Andy Lester, ack...@googlegroups.com, Rob Hoelz
Clump would partially fall put of a paragrep mode as I was suggesting, but if I understand your proposal, clump would operate on the -A3 context window. Paragrep windows would be variable size, ended by a token eg empty line or /^END.*/
In either case we need to define if default pattern is /sm or not. IIRC ack2 is always not /sm  aka (?:-sm)

Andy Lester

unread,
Mar 13, 2017, 5:15:04 PM3/13/17
to Bill Ricker, ack development, Rob Hoelz

On Mar 13, 2017, at 4:07 PM, Bill Ricker <bill....@gmail.com> wrote:

Clump would partially fall put of a paragrep mode as I was suggesting, but if I understand your proposal, clump would operate on the -A3 context window. Paragrep windows would be variable size, ended by a token eg empty line or /^END.*/
In either case we need to define if default pattern is /sm or not. IIRC ack2 is always not /sm  aka (?:-sm)

—clump would have to be mutex with any of the context flags.   Non-consecutive lines get a blank line between them.

100: foo
101: foo
102: foo

110: foo

168: foo

189: foo
190: foo

202: foo

Bill Ricker

unread,
Mar 13, 2017, 10:54:29 PM3/13/17
to Andy Lester, ack development, Rob Hoelz
On Mon, Mar 13, 2017 at 5:15 PM, Andy Lester <an...@petdance.com> wrote:
> —clump would have to be mutex with any of the context flags.

Probably so. I could imagine wanting context around a clump cluster
but i have more trouble imagining implementing both at once !

Ok, you seem to be matching input lines line by line as per normal,
and just clumping the _output_ lines if sequential (or spacing when
not sequential)
... which is not what i originally thought you meant.

Due to my previously suggested bias, I was (mis)interpreting your
cluster/clump as clumping _input_ lines in a way loosely analogous to
the context range -C/-A/-B *except* (a) the whole multiline paragraph
as matchable, and (b) some other print rules.

Your example output made me think --output '$1: $2' would be very
useful with multiline clumped match


clumping multiple _input_ lines per my (mis)interpretation could be via

* paragrep-ish, taking all to blank line (or some other magic token on
a flag) - and steping forward by whole paragraphs;
* -C/-A/-B style count of lines; and stepping by one line or
optionally by the whole group (always or if matched?)
* via matching leftmost unmatched bracketish char on current line to
it's close, but stepping to next line. (User's pattern needs to not
match every surrounding brackets or it will duplicate hits, but that's
user problem.)


--
Bill Ricker
bill....@gmail.com
https://www.linkedin.com/in/n1vux

Edward Avis

unread,
Mar 14, 2017, 9:22:28 AM3/14/17
to ack development
If this is for ack version 3, and you are happy with the small compatibility break, why not turn on 'clumping' by default?  Those who prefer a dry kind of output without extra whitespace will probably use grep anyway.
Reply all
Reply to author
Forward
0 new messages