negate grep

13 views
Skip to first unread message

Robert Citek

unread,
Nov 24, 2009, 6:09:52 PM11/24/09
to stl...@googlegroups.com
Hello all,

I'm looking for the equivalent of 'grep -v' in ruby.  For example,
let's say I have the following array:

a=%w(foo bar bat)

I can grep for those items that have an /b/ with this:

irb(main):016:0> a.grep(/b/)
=> ["bar", "bat"]

I would like to be able to grep by specifying a negative, such as,
those items not having /f/.  The construct I've been able to come up
with is this:

irb(main):017:0> a - a.grep(/f/)
=> ["bar", "bat"]

Is there a "better" way?

Thanks in advance.

Regards,
- Robert

Patrick Schless

unread,
Nov 25, 2009, 1:50:52 AM11/25/09
to stl...@googlegroups.com
"my_array.grep(/b/)" is equivalent to:
"my_array.select {|elem| /b/ === elem}"

so to reverse it I'd do:

my_array.reject {|elem| /b/ === elem}

You could use =~ in place of === for readability, since I believe they are equivalent (the docs say grep uses === though)


--

You received this message because you are subscribed to the Google Groups "Saint Louis Ruby Users Group" group.
To post to this group, send email to stl...@googlegroups.com.
To unsubscribe from this group, send email to stlruby+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/stlruby?hl=en.



Christopher M

unread,
Nov 25, 2009, 9:13:11 AM11/25/09
to stl...@googlegroups.com
The very first construct is correct, you just need to use a regex negation: %w(a b c d e).grep(/(?!b)/)

Christopher M

unread,
Nov 25, 2009, 9:14:14 AM11/25/09
to stl...@googlegroups.com


On Wed, Nov 25, 2009 at 8:13 AM, Christopher M <ign...@gmail.com> wrote:
The very first construct is correct, you just need to use a regex negation: %w(a b c d e).grep(/(?!b)/)

Specifically to your example:
>> %w(foo bar baz).grep(/^(?!f)/)
=> ["bar", "baz"]

Robert Citek

unread,
Nov 25, 2009, 10:12:55 AM11/25/09
to stl...@googlegroups.com
I'm not familiar with that regex notation.

^ is the beginning of line anchor.
() is grouping.
?! is the "zero-width negative look-ahead assertion"[1].

Unfortunately, I do not quite understand how the look-ahead assertion
works in this case.

Changing the regex from /f/ to /a/ gives me these results:

irb(main):021:0> a=%w(foo bar baz)
=> ["foo", "bar", "baz"]
irb(main):022:0> a - a.grep(/a/)
=> ["foo"]
irb(main):023:0> a.grep(/^(?!a)/)
=> ["foo", "bar", "baz"]

Here's the result using Patrick's #reject method:

irb(main):025:0> a.reject {|e| /a/ === e }
=> ["foo"]

[1] http://www.zenspider.com/Languages/Ruby/QuickRef.html

Regards,
- Robert

Robert Citek

unread,
Nov 25, 2009, 10:15:04 AM11/25/09
to stl...@googlegroups.com
On Wed, Nov 25, 2009 at 9:13 AM, Christopher M <ign...@gmail.com> wrote:
> The very first construct is correct, you just need to use a regex negation:
> %w(a b c d e).grep(/(?!b)/)

That does not seem to work for me:

irb(main):028:0> %w(a b c d e).grep(/(?!b)/)
=> ["a", "b", "c", "d", "e"]

Regards,
- Robert

Ed Howland

unread,
Nov 25, 2009, 10:59:16 AM11/25/09
to stl...@googlegroups.com
irb(main):005:0> %w(a b c d).grep /(?!b).+/
=> ["a", "c", "d"]

Robert, in your example, you haven't given it anything _to match_. The
negative lookahead assertion is just that: an assertion. It doesn't
match anything, just returns true or false if the inner regex matches
or doesn't match. It doesn't consume any characters in the target. [1]

HTH,
Regards,
Ed

[1] http://www.regular-expressions.info/lookaround.html
> --
>
> You received this message because you are subscribed to the Google Groups "Saint Louis Ruby Users Group" group.
> To post to this group, send email to stl...@googlegroups.com.
> To unsubscribe from this group, send email to stlruby+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/stlruby?hl=en.
>
>
>



--
Ed Howland
http://greenprogrammer.blogspot.com
http://twitter.com/ed_howland

Robert Citek

unread,
Nov 25, 2009, 11:21:46 AM11/25/09
to stl...@googlegroups.com
Thanks for the link. I see. I need something to match followed by the
not-match for a negative lookahead. That explains why Christopher
used the ^, so that something matched. To make the regex compatible
with the original array, I did this:

irb> a=%w(foo bar baz)
=> ["foo", "bar", "baz"]

irb> a.grep(/^(?!(.*a.*))/)
=> ["foo"]

irb> a.grep(/^(?!(.*f.*))/)
=> ["bar", "baz"]

That seems to work.

Regards,
- Robert

Ed Howland

unread,
Nov 27, 2009, 3:18:42 PM11/27/09
to stl...@googlegroups.com
Np. I was working this out further and came up with a cool use case.
Background: metaskills (Ken Collins) tweeted about wanting a better
method to iterate over a directory listing w/o the '.' and '..' files.
Combining Dir#entries and Christopher's and Robert's examples we can
get close to a 'ls' like listing:

irb(main):004:0> Dir.new('.').entries.grep(/^(?!\.)/)
=> ["caller.rb", "dir.rb", "exception.rb", "fklass.rb",
"Hash_rubify_keys.rb", "on.rb", "tree1"]

[Note: I love one-liners]

Add a .each block:
irb(main):007:0> Dir.new('.').entries.grep(/^(?!\.)/).each {|f| puts f}; nil
caller.rb
dir.rb
exception.rb
fklass.rb
Hash_rubify_keys.rb
on.rb
tree1
=> nil

Add this to class Dir:
class Dir
def self.enum(path='.', &block)
self.new(path).entries.grep(/^(?!\.)/).each {|f| yield f if block_given?}
end

def self.ls(path='.', &block)
files = enum(path, &block)
puts files.join(" ") unless block_given?
end
end

irb(main):023:0> Dir.ls
caller.rb dir.rb exception.rb fklass.rb Hash_rubify_keys.rb on.rb tree1
=> nil
irb(main):024:0> Dir.ls {|f| puts "->#{f}"}
->caller.rb
->dir.rb
->exception.rb
->fklass.rb
->Hash_rubify_keys.rb
->on.rb
->tree1
=> nil

And you can go on from there. I've added a "find dir" like ls_r
(recusrive listing) method. See [1]. I know there is the Find module.
This is just an exercise. Note, it hides hidden dirs that start with
'.' such as .git, etc.

Regards,
Ed
[1] http://pastie.org/717463
Comments welcome!!


On Wed, Nov 25, 2009 at 11:21 AM, Robert Citek <robert...@gmail.com> wrote:
> Thanks for the link. I see. I need something to match followed by the
> not-match for a negative lookahead.  That explains why Christopher
> used the ^, so that something matched.  To make the regex compatible
> with the original array, I did this:
>
> irb> a=%w(foo bar baz)
> => ["foo", "bar", "baz"]
>
> irb> a.grep(/^(?!(.*a.*))/)
> => ["foo"]
>
> irb> a.grep(/^(?!(.*f.*))/)
> => ["bar", "baz"]
>
> That seems to work.
>
> Regards,
> - Robert
>
--
Ed Howland
http://greenprogrammer.wordpress.com
http://twitter.com/ed_howland

Robert Citek

unread,
Nov 28, 2009, 1:51:55 PM11/28/09
to stl...@googlegroups.com
On Fri, Nov 27, 2009 at 3:18 PM, Ed Howland <ed.ho...@gmail.com> wrote:
> Np. I was working this out further and came up with a cool use case.
> Background: metaskills (Ken Collins) tweeted about wanting a better
> method to iterate over a directory listing w/o the '.' and '..' files.
> Combining Dir#entries and Christopher's and Robert's examples we can
> get close to a 'ls' like listing:
>
> irb(main):004:0> Dir.new('.').entries.grep(/^(?!\.)/)
> => ["caller.rb", "dir.rb", "exception.rb", "fklass.rb",
> "Hash_rubify_keys.rb", "on.rb", "tree1"]

Personally, I find the example that Patrick gave is "cleaner" than
using look-arounds, as I don't really want to create a new regexp, but
rather negate the existing one.

irb> Dir.new('.').entries.grep(/^(?!\.)/)

is then equivalent to this

irb> Dir.new('.').entries.reject {|e| /^\./ === e}

What I think would be nice would be to add an switch or option to
#grep to negate its operation. For example, something like these:

irb> Dir.new('.').entries.grep(/^\./v)
irb> Dir.new('.').entries.grep(!/^\./)
irb> Dir.new('.').entries.grep(/^\./,"v")

Along those lines, what does "===" mean? The docs I have found don't
explain it well[1].

[1] http://phrogz.net/ProgrammingRuby/language.html#table_18.4

Regards,
- Robert

Ed Howland

unread,
Nov 28, 2009, 9:02:15 PM11/28/09
to stl...@googlegroups.com
I have no idea which performs better, a grep with a look-around or
calling a block (Proc object) on every element in Dir.new().entries.
I'd be interested in performance metrics there.

BTW, Dir[pattern] is almost the same thing. It's a bit cumbersome,
though. It works a little more like find, than ls. E.g.

irb(main):002:0> Dir.enum
=> ["dir_ls.rb", "doc"]
irb(main):003:0> Dir['*']
=> ["dir_ls.rb", "doc"]
irb(main):004:0> Dir['doc/*']
=> ["doc/_index.html", "doc/class_list.html", "doc/css",
"doc/Dir.html", "doc/file_list.html", "doc/index.html", "doc/js",
"doc/method_list.html", "doc/top-level-namespace.html"]
irb(main):005:0> Dir.enum('doc')
=> ["_index.html", "class_list.html", "css", "Dir.html",
"file_list.html", "index.html", "js", "method_list.html",
"top-level-namespace.html"]

Then to do a recursion:

irb(main):022:0> Dir['doc/**/*']
=> ["doc/_index.html", "doc/class_list.html", "doc/css",
"doc/css/common.css", "doc/css/full_list.css", "doc/css/style.css",
"doc/Dir.html", "doc/file_list.html", "doc/index.html", "doc/js",
"doc/js/app.js", "doc/js/full_list.js", "doc/js/jquery.js",
"doc/method_list.html", "doc/top-level-namespace.html"]
irb(main):023:0> Dir.enum_r('doc')
=> ["doc/_index.html", "doc/class_list.html", "doc/css",
["doc/css/common.css", "doc/css/full_list.css", "doc/css/style.css"],
"doc/Dir.html", "doc/file_list.html", "doc/index.html", "doc/js",
["doc/js/app.js", "doc/js/full_list.js", "doc/js/jquery.js"],
"doc/method_list.html", "doc/top-level-namespace.html"]

Dir.enum_r gives you a tree structure, whereas Dir['**/*'] gives you a
flat array. Depends upon ypur needs I guess. Dir.ls and dir.ls_r
could/should be rewritten using Dir[pattern], with some work to
tranform the arg into the correct pattern. Using Dir.new() does not
take a glob. Dir.glob(pattern) does and takes a block.

First incomplete attempt:
class Dir
def self.ls(pattern='*', &block)
files = glob(pattern, &block)
puts files.join(" ") unless block_given?
end
end

BTW, tried the similar rush example and could not get it to work, am I
missing anything? ( doc['**'] => undefined method or variable
error.)

As far as ===, this has always been confusing to me. You have ==, ===,
.eql? and .equal? As best as I know, === is the case/when comparison.
As a boolean, it holds true whenever that would be true in a case
statement's 'when' expression. For instance: 1939 == 1900..1999 is
false, but 1939 === 1900..1999 is true. Because case 1939; when
1900..1999 would execute that when's then clause, since it is in that
range.

The mechanics of the operator are that it is a method defined in all
base Ruby objects. You can define it in your own objects if you want
them to be used in case expressions (or in the standalone 'a === b'
instnnce).

Regards,
Ed
> --
>
> You received this message because you are subscribed to the Google Groups "Saint Louis Ruby Users Group" group.
> To post to this group, send email to stl...@googlegroups.com.
> To unsubscribe from this group, send email to stlruby+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/stlruby?hl=en.
>
>
>



Christopher M

unread,
Nov 29, 2009, 10:15:20 AM11/29/09
to stl...@googlegroups.com
On Sat, Nov 28, 2009 at 8:02 PM, Ed Howland <ed.ho...@gmail.com> wrote:
I have no idea which performs better, a grep with a look-around or
calling a block (Proc object) on every element in Dir.new().entries.
I'd be interested in performance metrics there.

On 10000 files, with the benchmark applied 1000 times. Rejecting was faster than grepping:

grep_times = 1000.times.map do |i|
               t = Time.new.utc
               Dir.new("/tmp/test").entries.grep(/^(?!\.)/)
               Time.new.utc - t
             end

reject_times = 1000.times.map do |i|
                 t = Time.new.utc
                 Dir.new("/tmp/test").entries.reject { |e| /^\./ === e }
                 Time.new.utc - t
               end

grep_times.inject(0) { |s, t| s + t } / grep_times.size
=> 0.013402606
reject_times.inject(0) { |s, t| s + t } / reject_times.size
=> 0.00986612499999998
              

Reply all
Reply to author
Forward
0 new messages