Transducers improve performance more than expected

1,152 views
Skip to first unread message

JvJ

unread,
May 8, 2016, 6:03:23 PM5/8/16
to Clojure
I've been doing some code profiling lately, and I made one small change that drastically improved performance.


I had this function:

(defn run-systems
  "Run the systems in the order specified over
  the cross-map specified."
  ([cm] (run-systems system-order cm))
  ([order cm]
   (reduce (fn [acc f]
             (->> (get-profile f)
                  (cross-cols acc)
                                              (mapcat (comp entity-pairs f))
                  (into cm )))
    cm order)))


Executing this function 1000 times in a row gives a runtime of about 218 ms.

By making a small change and using mapcat as a transducer:

(defn run-systems
  "Run the systems in the order specified over
  the cross-map specified."
  ([cm] (run-systems system-order cm))
  ([order cm]
   (reduce (fn [acc f]
             (->> (get-profile f)
                  (cross-cols acc)
                  (into cm (mapcat (comp entity-pairs f)))))
    cm order)))

The runtime goes all the way down to 169 ms.

I knew that removing intermediate collections helped performance, but I wasn't expecting such a drastic improvement.

Does anyone know similar simple tricks (either transducer-related or not transducer-related) that could further improve performance of these types of operations?

(Runtime results are averaged over many runs using the criterium profiling library, so it's not just a fluke of thread scheduling).

Rangel Spasov

unread,
May 8, 2016, 8:34:30 PM5/8/16
to Clojure
In my experience, the more intermediate collections you eliminate, the more you gain:

Transducers:

(criterium.core/with-progress-reporting
(criterium.core/quick-bench
(into []
(comp
(map inc))
(range 100000))))

             Execution time mean : 3.803073 ms
    Execution time std-deviation : 69.000641 µs
   Execution time lower quantile : 3.715792 ms ( 2.5%)
   Execution time upper quantile : 3.875992 ms (97.5%)
                   Overhead used : 8.938753 ns

Good-old sequences:

(criterium.core/with-progress-reporting
(criterium.core/quick-bench
(->> (range 100000)
(map inc)
(into []))))

             Execution time mean : 5.104560 ms
    Execution time std-deviation : 150.661809 µs
   Execution time lower quantile : 4.904680 ms ( 2.5%)
   Execution time upper quantile : 5.284278 ms (97.5%)
                   Overhead used : 8.938753 ns

Some improvement ~30% faster. However, with a lot of intermediary collections:

Transducers:

(criterium.core/with-progress-reporting
(criterium.core/quick-bench
(into []
(comp
(map inc)
(filter odd?)
(map dec)
(filter even?)
(map (fn [n] (+ 3 n)))
(filter odd?)
(map inc)
(filter odd?)
(map dec)
(filter even?)
(map (fn [n] (+ 3 n)))
(filter odd?))
(range 100000))))

             Execution time mean : 6.036796 ms
    Execution time std-deviation : 95.168278 µs
   Execution time lower quantile : 5.882058 ms ( 2.5%)
   Execution time upper quantile : 6.125739 ms (97.5%)
                   Overhead used : 8.938753 ns

Good-old sequences:

(criterium.core/with-progress-reporting
(criterium.core/quick-bench
(->> (range 100000)
(map inc)
(filter odd?)
(map dec)
(filter even?)
(map (fn [n] (+ 3 n)))
(filter odd?)
(map inc)
(filter odd?)
(map dec)
(filter even?)
(map (fn [n] (+ 3 n)))
(filter odd?)
(into []))))

             Execution time mean : 12.826507 ms
    Execution time std-deviation : 345.613000 µs
   Execution time lower quantile : 12.379043 ms ( 2.5%)
   Execution time upper quantile : 13.193640 ms (97.5%)
                   Overhead used : 8.938753 ns

In this case, transducers are more than twice as fast. 

Herwig Hochleitner

unread,
May 9, 2016, 1:24:15 AM5/9/16
to clo...@googlegroups.com
My theory has been, that transducer stacks inline much better, hence allow for more optimizations by the jit.
In particular I suspect that escape analysis works better on them, so the compiler can even move some of the remaining allocations to the stack.

To verify this, try running with -verbose:gc and with either -XX:-DoEscapeAnalysis then -XX:+DoEscapeAnalysis

JvJ

unread,
May 9, 2016, 5:54:48 PM5/9/16
to Clojure
In a similar vein, do you think that eductions are generally a better idea than lazy sequences/for comprehensions?

Alex Miller

unread,
May 9, 2016, 7:07:55 PM5/9/16
to Clojure
eductions are non-caching (will re-perform their work each time they are used), so most of the time I would say lazy sequences are preferable.

JvJ

unread,
May 10, 2016, 12:22:24 PM5/10/16
to Clojure
In that case, why aren't eductions just lazy sequences?

Alex Miller

unread,
May 10, 2016, 12:45:50 PM5/10/16
to Clojure
Because some of the time you don't want caching. For example, if you want to (later) reduce over a large (larger than memory even) external resource. eductions allow you to define the source in one spot but defer the (eager) reduction until later.

JvJ

unread,
May 10, 2016, 7:14:10 PM5/10/16
to Clojure
That brings me to another thing I've wondered about.  It is a typical clojure idiom to do something like (reduce + (range 1000)).

But, unlike imperative loops, this will cache all those 1000 elements.  This can kind of bloat memory, especially with large sequences?

How can you get around it (other than tail-recursion or the while construction)?

Alan Thompson

unread,
May 10, 2016, 7:46:39 PM5/10/16
to clo...@googlegroups.com
I don't understand what you mean. '(range 1000)' produces a lazy sequence, and '(reduce + ...)' doesn't hold onto the head of the lazy sequence. Therefore, each element can be GC'd as soon as added into the running total, the the lazy sequence only produces new elements as they are requested by the reduction (chunking aside, of course).
Alan

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Timothy Baldridge

unread,
May 10, 2016, 8:21:12 PM5/10/16
to clo...@googlegroups.com
In addition, as of 1.7, (range 1000) no longer creates a lazy sequence. It creates something that acts a bit like a sequence, but is reducable. So doing something like (reduce + 0 (range 1000)) is super fast and creates almost no garbage at all. 
--
“One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.”
(Robert Firth)

Alex Miller

unread,
May 10, 2016, 11:30:41 PM5/10/16
to Clojure
range is reducible and boils down to just a local loop in most cases, so shouldn't create any heap garbage (well, other than whatever your reducing function does). 

See:

Additionally, it can act as a chunked lazy sequence, and in that usage it stores no values either, each chunk is just a start + a step and values are produced by offset. It does cache the lazy chunked sequence though, so you'll get a linked list of chunks essentially (representing 32 values each). 

More than you want to know can be found here: http://dev.clojure.org/jira/browse/CLJ-1515

Similar tricks are played with the other sequence "generators" like iterate, cycle, and repeat, although those don't chunk (http://dev.clojure.org/jira/browse/CLJ-1603). 

But to your original question about whether using sequences bloats memory - yes in the general case. For large sequences, you should consider alternatives - collections (like vectors) or in extreme cases Clojure primitive vectors (see vector-of), Java collections, or arrays (usually the most compact). Like everything else, sequences are a tradeoff, and generally a good one due to caching and immutability, tempered by chunking and transients. I'm glad we have more options now though with transducers. 


On Tuesday, May 10, 2016 at 7:21:12 PM UTC-5, tbc++ wrote:
In addition, as of 1.7, (range 1000) no longer creates a lazy sequence. It creates something that acts a bit like a sequence, but is reducable. So doing something like (reduce + 0 (range 1000)) is super fast and creates almost no garbage at all. 
On Tue, May 10, 2016 at 5:46 PM, Alan Thompson <cloo...@gmail.com> wrote:
I don't understand what you mean. '(range 1000)' produces a lazy sequence, and '(reduce + ...)' doesn't hold onto the head of the lazy sequence. Therefore, each element can be GC'd as soon as added into the running total, the the lazy sequence only produces new elements as they are requested by the reduction (chunking aside, of course).
Alan
On Tue, May 10, 2016 at 4:14 PM, JvJ <kfjwh...@gmail.com> wrote:
That brings me to another thing I've wondered about.  It is a typical clojure idiom to do something like (reduce + (range 1000)).

But, unlike imperative loops, this will cache all those 1000 elements.  This can kind of bloat memory, especially with large sequences?

How can you get around it (other than tail-recursion or the while construction)?

On Tuesday, 10 May 2016 09:45:50 UTC-7, Alex Miller wrote:
Because some of the time you don't want caching. For example, if you want to (later) reduce over a large (larger than memory even) external resource. eductions allow you to define the source in one spot but defer the (eager) reduction until later.

On Tuesday, May 10, 2016 at 11:22:24 AM UTC-5, JvJ wrote:
In that case, why aren't eductions just lazy sequences?

On Monday, 9 May 2016 16:07:55 UTC-7, Alex Miller wrote:
eductions are non-caching (will re-perform their work each time they are used), so most of the time I would say lazy sequences are preferable.

On Monday, May 9, 2016 at 4:54:48 PM UTC-5, JvJ wrote:
In a similar vein, do you think that eductions are generally a better idea than lazy sequences/for comprehensions?

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

JvJ

unread,
May 11, 2016, 11:41:20 AM5/11/16
to Clojure
That is very good to know.

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages