Your input please: Ignoring files and directories in ack 3

56 views
Skip to first unread message

Andy Lester

unread,
Apr 4, 2017, 11:54:54 PM4/4/17
to ack-...@googlegroups.com
--ignore-file and --ignore-dir have caused a lot of problems in ack 2. It's not clear how they work, and nobody's happy with them, so we're redoing them from scratch for ack 3. The core of the problems: Pathing, and what are they relative to?

...

It seems to me that there are two distinct use cases for --ignore-file and --ignore-dir.

1) You ack for something, and get a bunch of results in a file or directory that you want to not deal with, so you rerun the ack but with --ignore-file=whatever added on.

2) You have a project that has files that you know are not worth searching, and so they are stored in your project-level .ackrc. You know you want to ignore the data/sample/ directory, and you know you want to ignore the Tags file in the root of the project.

Are there any other times that it makes sense that you would use --ignore-file or --ignore-dir?

...

Given those two use-cases, I think it makes sense to have two different ways of ignoring files. What I'm thinking of is having a version of --ignore-dir/file that is specifically for .ackrc files and is relative to where that file is, and a version of --ignore-dir/file that is for the command line and is relative to the current directory from where ack was invoked.

So you might have something like ~/project/.ackrc containing:
--ignore-in-project=data/sample/
--ignore-in-project=Tags

These refer to ~/project/data/sample and ~/project/Tags, respectively.

Does this seem like a reasonable addition? How would this fit with how you use ack?

Thanks for your input and support.

Andy

--
Andy Lester => www.petdance.com

Rod Bartlett

unread,
Apr 5, 2017, 6:25:12 PM4/5/17
to ack users
Perhaps I'm an anomaly but I'm satisfied with the way the --ignore-dir and --ignore-file directives currently work.

My use case is a variation of number 2.  I have a project for an embedded system which contains source files for multiple hardware platforms.  Some of the source files in this project are shared among more than one platform and others are specific to a single platform.  Among these many source files are some file types which are never of interest to me.  For example one of the 10 different C compilers we use generates .asm files during compilation.  There's little sense in searching the C source and the generated .asm files from that source. 

I use a single .ackrc file rather than a project-level .ackrc because at any given time I have multiple copies of the project checked out working on multiple bugs and different release versions.  There's little need for a project-level .ackrc since all source trees are identical.

For the files I'm never interested in, I store my --ignore-dir and --ignore-file directives in my .ackrc file.  I augment these with additional directives stored in the ACK_OPTIONS environment variable which help focus the searching on the hardware platform of immediate interest.  Switching to a different hardware platform then becomes a simple matter of updating the ACK_OPTIONS environment variable to ignore files and directories specific to all platforms but the one I'm working on at the moment.  I've got command aliases to simplify updating the ACK_OPTIONS variable for each hardware platform.

I usually do my ack searches from the top level project directory or occasionally from a subdirectory if I want to limit the scope of the search to a specific subsystem.

Thanks for all the work you've put into ack.  It's rare for a tool like this to fit my needs so completely.

 Rod

Andy Lester

unread,
Apr 6, 2017, 11:01:19 AM4/6/17
to ack-...@googlegroups.com
> Perhaps I'm an anomaly but I'm satisfied with the way the --ignore-dir and --ignore-file directives currently work.

The more I’m looking at it, I think that —ignore-dir and —ignore-file are going to have to not change behavior (or at least not much) and we’ll have to just add new options that handle the pathing problems.


> Thanks for all the work you've put into ack. It's rare for a tool like this to fit my needs so completely.


Thanks for the kind words, and thanks for putting in the time to tell us about your ack usage.

Mike Kelly

unread,
Jan 7, 2018, 10:06:04 PM1/7/18
to ack users
New to the group but been using Ack for a while based on a referral from a colleague.

Here's our situation - which like many may be weird enough that it's not worth addressing.  We do technical due diligence for software acquirers, so we are often getting large drops (tens of thousands of source files, 1MM - 10MM lines).  We have a tool that runs over this codebase and compiles metadata about the sources - what type they are, lines of code, etc.  We then spend some time teasing out files that we don't care about.  There are three main cases for why we don't care:

a. They are third-party source and we're focusing on the code the company we're evaluating wrote.  We have various heuristics for identifying third-party.
b. They are generated files, e.g. in a build directory.
c. We were told by the company we're evaluating to ignore them - typically these are things like one-off research projects that don't matter as part of the acquisition.

What we end up with is a SQL database that has the metadata about all the files in the drop, and some of the per-file rows in this database have Exclude_File='Y' or Third_Party = 'Y'.  Our tools use that database to determine which files are "interesting" to us.

I would love to teach ack about this.  One possibility is to just delete the files we don't care about from the tree, but we do do builds as well and that would be not ideal.  We could have two versions of the tree - one for searching, one for building - but that seems kinda hacky.

My thought was to emit a file from the SQL database that has relative paths to the files where Exclude_File='Y' OR Third_Party='Y' and drop that in the root.  Ack would somehow know to read that file and not search those files.  It seems like I might be able to use a project level .ackrc file (didn't know about those before today) with a bunch of --ignore-file directives - but there might be thousands and thousands of them and I worry about performance here.  Is this abusing this mechanism in ack?  Other thoughts?

Thanks.

Bill Ricker

unread,
Jan 8, 2018, 12:16:03 AM1/8/18
to ack-...@googlegroups.com
In meta-data, strength !

> My thought was to emit a file from the SQL database that has relative paths to the files where Exclude_File='Y' OR Third_Party='Y'

That's the sort of thinking I like.
Yes, --ignore-file and --ignore-directory would do that.

> and drop that in the root. Ack would somehow know to read that file and not search those files.

There is a commandline arg --ackrc=file to specify an alternate ackrc
file; that might be useful.

> It seems like I might be able to use a project level .ackrc file

IIRC we added that in Ack2.0

> (didn't know about those before today)
> with a bunch of --ignore-file directives - but there might be thousands and thousands of them and I worry about performance here.

Thousands of small ackrc files might perform better than one huge one.
Or worse.
I don't recall if Rob or Andy have benchmarked that.

> Is this abusing this mechanism in ack?

Well probably yes :-) but that's what ack is for, to take abuse that
otherwise would be absorbed by our eyeballs using grep. :-)

> Other thoughts?

One thing for you to look out for is if any of the projects you import
already has project or homedir .ackrc files.
If you have a backup of DEV system disks, the "user directory" $HOME
.ackrc files will become project .ackrc as you traverse the tree.
You may need to rename or remove those to avoid conflict.

Andy Lester

unread,
Jan 8, 2018, 12:19:19 AM1/8/18
to ack-...@googlegroups.com

> On Jan 7, 2018, at 5:43 PM, Mike Kelly <mrmicha...@gmail.com> wrote:
>
> It seems like I might be able to use a project level .ackrc file (didn't know about those before today) with a bunch of --ignore-file directives - but there might be thousands and thousands of them and I worry about performance here.

That sounds like an ideal use of project-level .ackrc files.

I don't think performance is likely to get hit. ack is pretty much I/O bound, so run through a list of exclusions in memory shouldn't be a problem.

spb...@gmail.com

unread,
Mar 12, 2018, 3:57:46 PM3/12/18
to ack users
If I'm understanding the problem right, would it go a long way toward what you're suggesting to keep the default behaviors, but add a special symbol:  I.e., "@" as a first character in a file/dir specification could represent the directory where "this" .ackrc file is? (analogous to ~)

--ignore-dir=@/data/sample
perhaps allowing combinations:
--ignore-file=@ext:xml

That might be simpler than a distinct option for a very similar concept.



On Tuesday, April 4, 2017 at 11:54:54 PM UTC-4, Andy Lester wrote:
Reply all
Reply to author
Forward
0 new messages