RFC: new clojure.set functions

8 views
Skip to first unread message

Jason Wolfe

unread,
Jan 29, 2009, 6:19:53 PM1/29/09
to Clojure
I've posted code for faster, multi-argument clojure.set functions
here:

http://paste.lisp.org/display/74534

Basic algorithms:
intersection: start with smallest set, and remove elements from it
that aren't in each other set
(always iterate through result-in-progress, which
will always be smaller)
union: start with biggest set, and add elements to it that are in
other sets
(always iterate through other set, which will be
smaller than result-in-progress)
difference: start with first set, and remove elements from all other
sets
(for each other set, choose on the fly which to
iterate through based on size)

Suggestions/comments/optimizations welcome.

Below are some timings comparing them to the current clojure.set
versions. I've run a number of experiments, and these are the fastest
versions I could come up with. They will be a bit slower than
"reduce"ing with the current ones (50% or so) when passed several tiny
(i.e., 1-element) sets due to the overhead of extracting the big/small
element, but as you can see, can be much much faster when there are
both big and small sets involved.

-Jason

(Rich, if you don't want multi-arg versions for now, let me know; I
figured since there was already an issue for that, I might as well
combine them.)

user> (let [small #{1}, big1 (set (range 10000)), big2 (set (range
5000 15000))]
(doseq [f [clojure.set/union union clojure.set/difference difference
clojure.set/intersection intersection]]
(prn f)
(time (dotimes [_ 10000000] (f small small)))
(time (dotimes [_ 1000] (f small big1)))
(time (dotimes [_ 1000] (f big1 small)))
(time (dotimes [_ 1000] (f big1 big2)))
(prn)))

#<set$union__5561 clojure.set$union__5561@c347ea>
"Elapsed time: 5170.231 msecs"
"Elapsed time: 23515.553 msecs"
"Elapsed time: 1.232 msecs"
"Elapsed time: 15733.047 msecs"

#<user$union__1166 user$union__1166@c4a6d8>
"Elapsed time: 7046.297 msecs"
"Elapsed time: 1.266 msecs"
"Elapsed time: 1.004 msecs"
"Elapsed time: 13562.658 msecs"

#<set$difference__5564 clojure.set$difference__5564@fb004>
"Elapsed time: 7032.965 msecs"
"Elapsed time: 3326.449 msecs"
"Elapsed time: 6.747 msecs"
"Elapsed time: 12569.736 msecs"

#<user$difference__1189 user$difference__1189@8c75f4>
"Elapsed time: 12001.778 msecs"
"Elapsed time: 41.649 msecs"
"Elapsed time: 20.441 msecs"
"Elapsed time: 16588.469 msecs"

#<set$intersection__5567 clojure.set$intersection__5567@69f31d>
"Elapsed time: 9972.448 msecs"
"Elapsed time: 3128.244 msecs"
"Elapsed time: 16500.68 msecs"
"Elapsed time: 20531.455 msecs"

#<user$intersection__1178 user$intersection__1178@1e6003>
"Elapsed time: 7990.208 msecs"
"Elapsed time: 1.4 msecs"
"Elapsed time: 2.463 msecs"
"Elapsed time: 12463.575 msecs"

(These may be a big screwy, but should be in the right ballpark.)

verec

unread,
Jan 31, 2009, 8:07:40 AM1/31/09
to Clojure
I'm must be missing something obvious for the [s1 s2] case of union:

Is it only that you've measured (conj s1 s2) to be faster when s2 is
the smallest ?

(defn un
....
([s1 s2]
(reduce conj s1 s2))
... )

user=> (def a #{1 2 3})
user=> (def b #{4 5 6 7})
user=> (def A #{:paul :georges :john})
user=> (def B #{:mark :mary})

user=> (un a b)
#{1 2 3 4 5 6 7}
user=> (un b a)
#{1 2 3 4 5 6 7}
user=> (un a B)
#{1 2 3 :mary :mark}
user=> (un B a)
#{1 2 3 :mary :mark}
user=> (un A B)
#{:paul :mary :john :mark :georges}
user=> (un B A)
#{:paul :mary :john :mark :georges}
user=>

???

--
JFB

Jason Wolfe

unread,
Jan 31, 2009, 12:46:29 PM1/31/09
to clo...@googlegroups.com

> I'm must be missing something obvious for the [s1 s2] case of union:
>
> Is it only that you've measured (conj s1 s2) to be faster when s2 is
> the smallest ?

Yes, that's it. "conj" iterates through the second argument, and
since count is O(1) for sets, it makes sense to always iterate through
the smaller argument. This makes a huge speed difference when you
have one small set and one big set, as you can see in the timings I
included.

-Jason


Reply all
Reply to author
Forward
0 new messages