I've posted code for faster, multi-argument clojure.set functions
here:
http://paste.lisp.org/display/74534
Basic algorithms:
intersection: start with smallest set, and remove elements from it
that aren't in each other set
(always iterate through result-in-progress, which
will always be smaller)
union: start with biggest set, and add elements to it that are in
other sets
(always iterate through other set, which will be
smaller than result-in-progress)
difference: start with first set, and remove elements from all other
sets
(for each other set, choose on the fly which to
iterate through based on size)
Suggestions/comments/optimizations welcome.
Below are some timings comparing them to the current clojure.set
versions. I've run a number of experiments, and these are the fastest
versions I could come up with. They will be a bit slower than
"reduce"ing with the current ones (50% or so) when passed several tiny
(i.e., 1-element) sets due to the overhead of extracting the big/small
element, but as you can see, can be much much faster when there are
both big and small sets involved.
-Jason
(Rich, if you don't want multi-arg versions for now, let me know; I
figured since there was already an issue for that, I might as well
combine them.)
user> (let [small #{1}, big1 (set (range 10000)), big2 (set (range
5000 15000))]
(doseq [f [clojure.set/union union clojure.set/difference difference
clojure.set/intersection intersection]]
(prn f)
(time (dotimes [_ 10000000] (f small small)))
(time (dotimes [_ 1000] (f small big1)))
(time (dotimes [_ 1000] (f big1 small)))
(time (dotimes [_ 1000] (f big1 big2)))
(prn)))
#<set$union__5561 clojure.set$union__5561@c347ea>
"Elapsed time: 5170.231 msecs"
"Elapsed time: 23515.553 msecs"
"Elapsed time: 1.232 msecs"
"Elapsed time: 15733.047 msecs"
#<user$union__1166 user$union__1166@c4a6d8>
"Elapsed time: 7046.297 msecs"
"Elapsed time: 1.266 msecs"
"Elapsed time: 1.004 msecs"
"Elapsed time: 13562.658 msecs"
#<set$difference__5564 clojure.set$difference__5564@fb004>
"Elapsed time: 7032.965 msecs"
"Elapsed time: 3326.449 msecs"
"Elapsed time: 6.747 msecs"
"Elapsed time: 12569.736 msecs"
#<user$difference__1189 user$difference__1189@8c75f4>
"Elapsed time: 12001.778 msecs"
"Elapsed time: 41.649 msecs"
"Elapsed time: 20.441 msecs"
"Elapsed time: 16588.469 msecs"
#<set$intersection__5567 clojure.set$intersection__5567@69f31d>
"Elapsed time: 9972.448 msecs"
"Elapsed time: 3128.244 msecs"
"Elapsed time: 16500.68 msecs"
"Elapsed time: 20531.455 msecs"
#<user$intersection__1178 user$intersection__1178@1e6003>
"Elapsed time: 7990.208 msecs"
"Elapsed time: 1.4 msecs"
"Elapsed time: 2.463 msecs"
"Elapsed time: 12463.575 msecs"
(These may be a big screwy, but should be in the right ballpark.)