I have an interesting requirement I'm not sure how to solve -
searching a large directory (150,000+ files) for two search terms,
returning only the filename as the result.
A positive match equals both search terms in the same file, regardless
of how many times each term appears in that file. The key is that
both are there.
I've tried cat * |grep <term1> |grep <term2> which gave me the "too
many arguments" error due to directory size. I was wondering if
anyone had a suggestion on a command I could use to do this search?
Thanks!
Dave Saunders
ummm try:
find . -exec grep -c search1 {} \; -exec grep -c search2 {} \; -print
hmmm.. that prints out a bunch of numbers (counts of occurances)
perhaps you can parse that output with another grep
find . -exec grep -c search1 {} \; -exec grep -c search2 {} \; -print \
| grep '^./'
you could also do a dual grep -l using the output of the first as a
search filelist for the second.
hope this helps
--
be safe.
flip
Verso l'esterno! Verso l'esterno! Deamons di ignoranza.
>I have an interesting requirement I'm not sure how to solve -
>searching a large directory (150,000+ files) for two search terms,
>returning only the filename as the result.
>A positive match equals both search terms in the same file, regardless
>of how many times each term appears in that file. The key is that
>both are there.
If you are doing this only a few times, you can use:
find . -type f -exec grep -q search1 {} \; -exec grep -q search2 {} \; -print
For efficiency, put the least frequent search term on the first grep. Still,
this is going to execute grep at least 150,000+ times, once for each file.
So, you can get a slight improvement with:
find . -type f -print | xargs -n 50 grep -l search1 | xargs grep -l search2
If the files have names that might contain spaces or other special
characters, you need to use the GNU or BSD versions of the utilities, with
special flags:
<set PATH to find GNU or BSD tools first>
find . -type f -print0 \
| xargs -0 -n 50 grep -lZ -- search1 \
| xargs -0 grep -l -- search2
The alternate commands are available from IBM's Linux Toolkit site:
<http://www-1.ibm.com/servers/aix/products/aixos/linux/download.html>
If you need to do many searches, look into using the "glimpse" package
to build an index of the files. <http://webglimpse.net>. Glimpse is
now commercial software. I think there used to be a free version you
might be able to find in some net archive.
I'm sure there are other content indexers that could also be used.
--
Dale Talcott, IT Research Computing Services, Purdue University
a...@quest.cc.purdue.edu http://quest.cc.purdue.edu/~aeh/