'seq' performance for 'empty?' compared to 'count'. And where's !=?

376 views
Skip to first unread message

Dave Tenny

unread,
Oct 1, 2015, 3:31:46 PM10/1/15
to Clojure
So I understand that 'seq' is the idiomatic way to see if a collection/sequence is empty.

Logically I'm looking for an O(1) predicate with which I can determine if a seq/collection is empty, and a well behaved
one that is idempotent and side effect free (for general performance reasons).

'seq'  leaves something to be desired here since it does heap allocations when called with various types of arguments.

I'll also freely confess that I dislike seq as an emptiness predicate, I like empty? and not-empty? types of things, even better,
I loved Common Lisp's treatment where NIL and empty lists have the same value in conditional situations
(i.e. if x is an empty list it is == nil and is false, unlike Clojure).  I guess what I'm saying here is that 'seq' as idiomatic clojure for
empty tests is distasteful.

In my own personal libraries I use a predicate to give the common lisp like semantics I desire, but it still calls clojure.core/empty?,
which still calls 'seq'.  So all I've done is make the test more expensive even if I find it more readable.

While looking at the Clojure 1.7 implementation of 'seq' today in RT.java, I happened across the implementation of 'count',
and it occurred to me that it was a much leaner and meaner test for empty sequences than 'seq'.  So here are some timings.

user> (doseq [thing [[] {} #{} (seq []) (into [] (range 0 1000)) (range 0 1000)]]
        (println "Time for 1M seq on" (type thing))
        (time (dotimes [i 1000000] (seq thing))))
Time for 1M seq on clojure.lang.PersistentVector
"Elapsed time: 18.993869 msecs"
Time for 1M seq on clojure.lang.PersistentArrayMap
"Elapsed time: 14.461354 msecs"
Time for 1M seq on clojure.lang.PersistentHashSet
"Elapsed time: 23.044994 msecs"
Time for 1M seq on nil
"Elapsed time: 8.291876 msecs"
Time for 1M seq on clojure.lang.PersistentVector
"Elapsed time: 21.214759 msecs"
Time for 1M seq on clojure.lang.LongRange
"Elapsed time: 6.27834 msecs"

Note the time on PersistentVector, we may have an O(n) implemention of `seq`
which is undesirable for an emptiness predicate.  Larger inputs require more time for 'seq' to return.

Now compare the same timings with 'count', adding in a clojure-level emptyness test (which would
no doubt be faster in RT.java):

user> (doseq [thing [[] {} #{} (seq []) (into [] (range 0 1000)) (range 0 1000)]]
        (println "Time for 1M count on" (type thing))
        (time (dotimes [i 1000000] (== (count thing) 0))))

Time for 1M count on clojure.lang.PersistentVector
"Elapsed time: 8.061005 msecs"
Time for 1M count on clojure.lang.PersistentArrayMap
"Elapsed time: 7.297715 msecs"
Time for 1M count on clojure.lang.PersistentHashSet
"Elapsed time: 10.165718 msecs"
Time for 1M count on nil
"Elapsed time: 4.190523 msecs"
Time for 1M count on clojure.lang.PersistentVector
"Elapsed time: 8.772564 msecs"
Time for 1M count on clojure.lang.LongRange
"Elapsed time: 27.166397 msecs"


Except for the LongRange input, `count` is much faster.

So I'm just bringing it up for discussion, wondering if we can get a faster clojure.core
implementation of `empty?`.

That said, there is one apples-to-apples problem in the above.

(seq x) returns nil if empty.  (== (count x) 0) returns true if nil is empty.

Clojure neophyte that I am, I can't understand where there is not a "!=" function for numbers.
Wrapping the above test in a 'not' gives:

user> (doseq [thing [[] {} #{} (seq []) (into [] (range 0 1000)) (range 0 1000)]]
        (println "Time for 1M count on" (type thing))
        (time (dotimes [i 1000000] (not (== (count thing) 0)))))

Time for 1M count on clojure.lang.PersistentVector
"Elapsed time: 15.961422 msecs"
Time for 1M count on clojure.lang.PersistentArrayMap
"Elapsed time: 11.12519 msecs"
Time for 1M count on clojure.lang.PersistentHashSet
"Elapsed time: 12.899191 msecs"
Time for 1M count on nil
"Elapsed time: 7.119841 msecs"
Time for 1M count on clojure.lang.PersistentVector
"Elapsed time: 11.454936 msecs"
Time for 1M count on clojure.lang.LongRange
"Elapsed time: 28.544915 msecs"

The O(1) times double when wrapping the test in 'not'.

Perhaps some clojurian will tell me what I should use.

Btw, don't even thing about 'not=':

user> (doseq [thing [[] {} #{} (seq []) (into [] (range 0 1000)) (range 0 1000)]]
        (println "Time for 1M count on" (type thing))
        (time (dotimes [i 1000000] (not= (count thing) 0))))
Time for 1M count on clojure.lang.PersistentVector
"Elapsed time: 44.768754 msecs"
Time for 1M count on clojure.lang.PersistentArrayMap
"Elapsed time: 40.158507 msecs"
Time for 1M count on clojure.lang.PersistentHashSet
"Elapsed time: 41.868068 msecs"
Time for 1M count on nil
"Elapsed time: 36.098277 msecs"
Time for 1M count on clojure.lang.PersistentVector
"Elapsed time: 42.149571 msecs"
Time for 1M count on clojure.lang.LongRange
"Elapsed time: 59.107886 msecs"

Well, that's not fair of course since '(not (==' and '(not (='  are two different things.
But then you'd expect '=' to be as slow, and it isn't, so somethings gets optimized with '=' vs. '=='.

user> (doseq [thing [[] {} #{} (seq []) (into [] (range 0 1000)) (range 0 1000)]]
        (println "Time for 1M count on" (type thing))
        (time (dotimes [i 1000000] (= (count thing) 0))))

Time for 1M count on clojure.lang.PersistentVector
"Elapsed time: 10.324226 msecs"
Time for 1M count on clojure.lang.PersistentArrayMap
"Elapsed time: 7.858452 msecs"
Time for 1M count on clojure.lang.PersistentHashSet
"Elapsed time: 10.132961 msecs"
Time for 1M count on nil
"Elapsed time: 4.097666 msecs"
Time for 1M count on clojure.lang.PersistentVector
"Elapsed time: 8.661771 msecs"
Time for 1M count on clojure.lang.LongRange
"Elapsed time: 26.705511 msecs"


James Reeves

unread,
Oct 1, 2015, 4:31:07 PM10/1/15
to clo...@googlegroups.com
Benchmarking the JVM can be difficult, as it uses a JIT compiler. Your examples tally more or less with my own benchmarks, but for the most accurate results I'd recommend using Criterium and ensuring that the :jvm-opts map in your project file is empty:

    :jvm-opts ^:replace {}

The reason that Clojure distinguishes between nil and the empty list is because seqs may be lazy and nil is not. Clojure doesn't know whether a lazy seq is empty or not until it's realised.

Regarding your benchmarks, it's true that checking whether non-seq collections are empty takes longer than it could do, but in practice I don't think this matters overly much. If you need to check whether a collection is empty many times, chances are you're iterating over the collection, in which case it should already have been transformed into a seq.

I guess the question is: are there any scenarios that don't involve iteration where you'd want to check whether a collection is empty many times?

- James

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nathan Davis

unread,
Oct 1, 2015, 6:40:29 PM10/1/15
to Clojure
On Thursday, October 1, 2015 at 2:31:46 PM UTC-5, Dave Tenny wrote:
So I understand that 'seq' is the idiomatic way to see if a collection/sequence is empty.


I'm not sure where you got this from.  I personally use empty? to check whether a collection is empty.  It is true that (not (empty c)) is not encouraged.  I believe the main rationale for this this is that (empty c) is (not (seq c)), so (not (empty c)) is (not (not (seq c)).
 
Logically I'm looking for an O(1) predicate with which I can determine if a seq/collection is empty, and a well behaved
one that is idempotent and side effect free (for general performance reasons).

I believe all the implementations of seq in Clojure core are O(1), although some (most?) allocate objects.  I'm not sure if it's explicitly spelled out anywhere, but I would consider it a bug it was anything other than O(1) (or perhaps O(log n) at most).

In what ways is the current implementation of empty not well behaved and idempotent?

With regards to side effects, if you can find a completely generic, side-effect-free way of determining whether a lazy sequence is empty without potentially realizing its head, please let the Clojure community know!

I'm not saying having an explicit 'empty' method is a bad idea, but I'm not sure the current situation is as bad as you think.

Nathan Davis

Gary Trakhman

unread,
Oct 1, 2015, 6:52:09 PM10/1/15
to Clojure
Count is going to be slow for seqs or cons's, but I think a generalized function could be implemented via protocol, falling back to seq.  Things that extend https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Counted.java are going to have a fast 'count' method.

--

Dave Tenny

unread,
Oct 2, 2015, 7:34:47 AM10/2/15
to clo...@googlegroups.com
Re: where I got this from:

user> (doc empty?)
-------------------------
clojure.core/empty?
([coll])
  Returns true if coll has no items - same as (not (seq coll)).
  Please use the idiom (seq x) rather than (not (empty? x))

Note that 'empty?' just calls 'seq'. 



--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/fUygeQMPqyI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

Mark Engelberg

unread,
Oct 2, 2015, 3:48:43 PM10/2/15
to clojure
When I know I'm dealing with a counted collection, like vector, then for performance I test for empty with (zero? (count v)).

It would be nice if empty? were more polymorphic, but I'm not sure how that would affect performance - might make things worse for the common case.  Certainly in the general case, the current implementation of empty? is the only real way to do it.  If you have an infinite sequence, a count-based version would run forever.
Reply all
Reply to author
Forward
0 new messages