Only there's a wee problem:
user=> (identical? [] (pop [3]))
false
Not all empty vectors are considered identical. = works:
user=> (= [] (pop [3]))
true
= is also ten times SLOWER than seq if the empty vectors are not
identical, though it's as fast as identical? when they are:
user=> (let [iterations 100000000] (time (dotimes [_ iterations]
(= [] (pop [3])))) (time (dotimes [_ iterations] (seq []))))
"Elapsed time: 29994.93852 msecs"
"Elapsed time: 2745.21924 msecs"
This is troubling. The seq function should be faster than this, and =
on nonidentical empty colls should be WAY faster than this.
On the other hand, the identical? test taking <4ms for 100,000,000
iterations suggests it was optimized away to nothing by the JIT
compiler. Unless of course both of us have 100GHz CPUs that can
execute a fresh instruction every 10 picoseconds. :)
> In the API it is suggested to use `seq` to check if coll is empty.
Your timing results raise some interesting questions, however, the API doesn't suggest using 'seq' to check if a collection is empty. That's what 'empty?' is for. The documentation note suggests (for style purposes apparently) that you use 'seq' to test that the collection is not empty. So to be precise you are testing two different things below. For instance, (identical? coll []) is true when coll is an empty vector. (seq coll) is true when coll is not empty. The correct equivalent would be to test (empty? coll).
Of course, this doesn't change the results. I get similar timings with empty?:
user=> (let [iterations 100000000] (time (dotimes [_ iterations]
(identical? [] []))) (time (dotimes [_ iterations] (empty? []))))
"Elapsed time: 2.294 msecs"
"Elapsed time: 2191.256 msecs"
nil
user=> (let [iterations 100000000] (time (dotimes [_ iterations]
(identical? "" ""))) (time (dotimes [_ iterations] (empty? ""))))
"Elapsed time: 2.657 msecs"
"Elapsed time: 4654.622 msecs"
nil
user=> (let [iterations 100000000] (time (dotimes [_ iterations]
(identical? () ()))) (time (dotimes [_ iterations] (empty? ()))))
"Elapsed time: 2.608 msecs"
"Elapsed time: 2144.142 msecs"
nil
This isn't so surprising though, considering that 'identical?' is the simplest possible test you could try--do two references point to the same object in memory? It can't get any more efficient than that.
Have all good days,
David Sletten
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
It occurred to me that the JIT might not be hoisting the constant (pop
[3]) out of the loop so I retested with this:
user=> (let [iterations 100000000 ev (pop [3])] (time (dotimes [_ iterations]
(= [] ev))) (time (dotimes [_ iterations] (seq []))))
"Elapsed time: 16477.68672 msecs"
"Elapsed time: 2105.01128 msecs"
Still nearly 10x slower. The JIT did something (optimized the function
call overhead away, most likely) but it didn't hoist the entire thing,
and most of the time is indeed spent in the = test.
= is damned slow.
In the particular bit of code I'm working on, I have a collection of
vectors. The last one is always [] and no others are empty. Using
`identical?` instead of `seq` to detect that last vector shaved quite
a bit of time in my loop.
Interestingly, (= (count v) 0) is relatively fast (900 msec); (seq v)
is slower (2s); (empty? v) is even slower (3s); and (= v []) is
slowest (16s).
As for a real sentinel, nothing beats the old (Object.) -- it's
guaranteed not equal to any existing object at time of creation, so no
preexisting piece of data in your input will look like a
freshly-created one, unlike the case with such traditional sentinels
as nil and zero. (The zero sentinel in C strings has caused no end of
trouble whenever the input string could have an ASCII NUL in it, and
we're all sick to death of Java collections that choke on null
elements, keys, or values. ConcurrentHashMap is a notorious culprit
there.)
(let [sentinel (Object.) v (conj some-vector sentinel)]
(loop [...]
(if (= sentinel something) ...
Another nice thing about (Object.) is that the Object .equals method
boils down to the Java == operator, and JIT-optimizes away to same, so
the above is just as fast as, and more readable than, (if (identical?
sentinel something) ...). Just be sure to put the sentinel first so
the object whose method is being called is an unspecialized Object. If
it's the other way around something else's .equals gets called, and if
it's well designed it will start with if (other == this) return true;,
and branch prediction on the JITted code will eventually eliminate the
cost of the rest of the method not just being return false;, but you
never know.
> Interestingly, (= (count v) 0) is relatively fast (900 msec); (seq v)
> is slower (2s); (empty? v) is even slower (3s); and (= v []) is
> slowest (16s).
Strange. Here are the timings I get for 1e8 iterations:
(zero? (count v)): ~3600 msecs
(seq v): ~2300 msecs
(empty? v): ~3100 msecs
(= v []): ~460 msecs
(= 0 (count v)) is even slower than (zero? (count v)). I'm running Clojure 1.2 on a MacBook (Core 2 Duo, 2GHz).
This brought to mind the general case of detecting emptiness. The
current practice of using `seq` to check for non-emptiness wastes
resources.
This limits the use of nice abstractions like `reduce` in
performance critical environments where a sentinel could be used. Not
everyone will be able to use a sentinel value to solve their
particular problem. What can be done to make `empty` faster?
Yes, the API *does* suggest using seq to check for emptiness. (empty?
x) is implemented as (not (seq x)). You certainly won't ever get
improved performance by using empty? - at best you break even, most of
the time you lose. For example:
Of course performance isn't usually the main driver, so if you feel
empty? really is more expressive in your case, go for it. But the OP
seems to care about performance, and suggesting empty? is off the
mark.
I disagree. I tend to prefer to front-load the empty case, because
it's usually the base case of a (recur)sion. Getting it out of the way
quickly and early establishes both how the result is generated and the
semantics of many of the loop variables.
More generally, I like to test for, handle, and discard the
corner/special cases and quickie cases and then have the meat of the
function last. It gets the distractions out of the way to focus on the
main, common case (in loops, the iteration rather than the
termination).
> Further, you usually have to seq the object anyway if
> it's not empty, so the seq call becomes free except in the case where
> the collection is in fact empty:
> (if-let [x (seq x)]
> ; do stuff with x
> ; give up
>
> compared to:
> (if (not (seq x)) ; this is what empty? does
> ; give up
> (let [useful-var (seq x)]
> ; do stuff with useful, seq'd version of x
Funny. My own use cases tend not to use seq explicitly at all. I'm
likely to have e.g.
(loop [s some-coll o nil]
(if (empty? s)
o
(let [f (first s)]
(blah blah blah s blah blah blah f blah blah blah)
(recur (rest s) (conj o foobar)))))
or some similar control flow structure, where s gets first and rest
used on it, but not (directly) seq. If some-coll is not already a seq,
using these implicitly generates a seq view of it with seq; on every
iteration but the first, s is bound to a seq implementation already.
Am 15.11.2010 um 23:07 schrieb Ken Wesson:
> (loop [s some-coll o nil]
> (if (empty? s)
> o
> (let [f (first s)]
> (blah blah blah s blah blah blah f blah blah blah)
> (recur (rest s) (conj o foobar)))))
>
> or some similar control flow structure, where s gets first and rest
> used on it, but not (directly) seq. If some-coll is not already a seq,
> using these implicitly generates a seq view of it with seq; on every
> iteration but the first, s is bound to a seq implementation already.
Changing the above code to the following (which is similarly readable) should give an immediate speed bump. Rich once stated something around 20%, although I have not verified the numbers and this was quite a while ago...
(loop [s (seq some-coll)
o nil]
(if s
(let [f (first s)]
(bla bla bla f bla bla bla)
(recur (next s) (conj o foobar)))
o))
Sincerely
Meikel
Eh. I'd heard first and rest had replaced next. No?
rest and next do different things:
rest - Returns a possibly empty seq of the items after the first.
Calls seq on its argument.
next - Returns a seq of the items after the first. Calls seq on its
argument. If there are no more items, returns nil.
--
Sean A Corfield -- (904) 302-SEAN
Railo Technologies, Inc. -- http://getrailo.com/
An Architect's View -- http://corfield.org/
"If you're not annoying somebody, you're not really alive."
-- Margaret Atwood
Salut Laurent,
6,502692ms / 7,393586ms ~ 0,88 => 12% improvement, no?
On 16 Nov., 09:51, Laurent PETIT <laurent.pe...@gmail.com> wrote:
> Agreed with the explanation, but ... 12% of what, exactly ?
But maybe this whole microbenchmarking story is paper waste. As I
said: quick'n'dirty.
Will now drop out of perfomance discussions...
Meikel