On 2012-05-19, Alan Curry <
pac...@kosh.dhis.org> wrote:
> In article <jp8r8e$r9p$
1...@speranza.aioe.org>, <
anon...@coward.org> wrote:
>>On 2012-05-19,
jrya...@gmail.com <
jrya...@gmail.com> wrote:
>>> Say I have a list of words.
>>> I want to find all the words that contain _all_ the characters 'i n v
>>e d t r'.
>>> These characters can be in any order.
>>>
>>> How would I use grep to accomplish this ?
>>
>>Grep is not the best tool for the job. But it can be done. Something like:
>>
>> $ cat /usr/share/dict/words | tr ' ' '\n' | grep i | grep n | grep v |
>> grep e | grep d | grep t | grep r
>
> What's with the tr? Do you have multiple words per line in your wordlist?
> I don't think that's normal.
>
> Anyway, the multiple greps can be consolidated.
>
> grep -Ev '^[^i]*$|^[^n]*$|^[^v]*$|^[^e]*$|^[^d]*$|^[^t]*$|^[^r]*$'
You can factor out that ^ $. The branches do not have to be individually
anchored; we just need one global instance of ^ $ wrapping to counteract the
regex search semantics of grep.
grep -Ev '^([^i]*|[^n]*|[^v]*|[^e]*|[^d]*|[^t]*|[^r]*)$'
>>
>>awk can do it with a single expression. E.g. something like:
>>
>> $ cat /usr/share/dict/words | tr ' ' '\n' |
>> awk '/i/ && /n/ && /v/ && /e/ && /d/ && /t/ && /r/'
>>
>
> Surprisingly, my double-reverse grep actually runs faster than the
> simpler-looking awk script.
The awk script has to try up to seven different regular expressions,
at various positions in each input line.
Grep has to search only with a single expression, and the anchoring may be
optimized; i.e. if every branch of the regex is anchored to ^, there
is the obvious optimization that you don't need to bother searching,
since the regex will fail at every position other than 0.