Functions using locks are slowed even when locks are never taken

Michael Blume

unread,

Nov 1, 2013, 2:34:15 PM11/1/13

to clo...@googlegroups.com

https://github.com/MichaelBlume/perf-test

(ns perf-test
  (:use (criterium core))
  (:gen-class))

(def o (Object.))

(defn foo [x]
  (if (> x 0)
    (inc x)
    (do
      (monitor-enter o)
      (let [res (dec x)]
        (monitor-exit 0)
        res))))

(defn bar [x]
  (if (> x 0)
    (inc x)
    (dec x)))

(defn locking-part [x l]
  (monitor-enter l)
  (let [res (dec x)]
    (monitor-exit l)
    res))

(defn baz [x]
  (if (> x 0)
    (inc x)
    (locking-part x o)))

(defn -main []
  (println "benching foo")
  (bench (foo 5) :verbose) 
  (println "benching bar")
  (bench (bar 5) :verbose)
  (println "benching baz")
  (bench (baz 5) :verbose)
  (println "done benching"))



I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz.

The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why.

$ lein uberjar
Compiling perf-test
Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
$ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
benching foo
WARNING: Final GC required 1.5974571326266802 % of runtime
x86_64 Mac OS X 10.8.3 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 24.0-b28
Runtime arguments:
Evaluation count : 391582560 in 60 samples of 6526376 calls.
      Execution time sample mean : 167.426696 ns
             Execution time mean : 167.459429 ns
Execution time sample std-deviation : 4.079466 ns
    Execution time std-deviation : 4.097819 ns
   Execution time lower quantile : 160.742869 ns ( 2.5%)
   Execution time upper quantile : 175.453376 ns (97.5%)
                   Overhead used : 1.634996 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 12.5602 % Variance is moderately inflated by outliers
benching bar
x86_64 Mac OS X 10.8.3 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 24.0-b28
Runtime arguments:
Evaluation count : 2174037300 in 60 samples of 36233955 calls.
      Execution time sample mean : 26.068923 ns
             Execution time mean : 26.066422 ns
Execution time sample std-deviation : 0.887937 ns
    Execution time std-deviation : 0.916861 ns
   Execution time lower quantile : 23.996763 ns ( 2.5%)
   Execution time upper quantile : 27.911936 ns (97.5%)
                   Overhead used : 1.634996 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 1 (1.6667 %)
	low-mild	 1 (1.6667 %)
	high-mild	 1 (1.6667 %)
 Variance from outliers : 22.1874 % Variance is moderately inflated by outliers
benching baz
x86_64 Mac OS X 10.8.3 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 24.0-b28
Runtime arguments:
Evaluation count : 2270676660 in 60 samples of 37844611 calls.
      Execution time sample mean : 25.834142 ns
             Execution time mean : 25.837429 ns
Execution time sample std-deviation : 0.718382 ns
    Execution time std-deviation : 0.729431 ns
   Execution time lower quantile : 24.837925 ns ( 2.5%)
   Execution time upper quantile : 27.595781 ns (97.5%)
                   Overhead used : 1.634996 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 2 (3.3333 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 15.7591 % Variance is moderately inflated by outliers
done benching

Michael Blume

unread,

Nov 1, 2013, 2:53:12 PM11/1/13

to clo...@googlegroups.com

Since 1.6 alpha is out, I reran the tests with that -- basically the same results.

Trenton Strong

unread,

Nov 2, 2013, 3:01:01 AM11/2/13

to clo...@googlegroups.com

Verified that I receive the same result patterns as you on my machine.

One other possibility outside of what you have already mentioned would be something JIT-related. Perhaps there is an optimization that can be performed if the locking sections of the code are in another function but otherwise no, but I'm not sure how that dovetails with Clojure' fn compilation.

Michael Blume

unread,

Nov 2, 2013, 8:32:42 PM11/2/13

to clo...@googlegroups.com

Hmm, it seems like if it were JIT related you'd see the same issue with java code, but I get 5ns across the board.

https://github.com/MichaelBlume/perf-test/blob/master/src-java/PerfTest.java

shlomi...@gmail.com

unread,

Nov 3, 2013, 5:47:44 AM11/3/13

to clo...@googlegroups.com

same results on my machine as well.

I tried decompiling the jar (with cider-decompile), and the parts that actually run (the "then" clause) seems almost identical. If you'r interested, i can post them later tonight.

Michał Marczyk

unread,

Nov 3, 2013, 12:30:45 PM11/3/13

to clojure

You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
than o (the sentinel object).

Besides that, in foo both monitor-enter and monitor-exit get their
arguments from a Var. Rewriting to use locking, which first puts the
object whose monitor will be used in a local (that is, (let [lockee o]
...), where ... performs the locking using the newly introduced
local), gives timings identical to those of bar and baz:

(defn foo' [x]
(if (> x 0)
(inc x)

(let [res (locking o (dec x))] res)))

So this is one reason not to use monitor-enter and monitor-exit
directly. Another reason is that locking guarantees that the monitor
will be released (by using try / finally, and of course by preventing
situations where the matching monitor-enter & monitor-exit operate on
different objects).

In fact, both monitor-enter and monitor-exit carry docstrings which
explicitly say that they should not be used in user code and point to
locking as the user-facing equivalent to Java's synchronized.

Cheers,
Michał

> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Michał Marczyk

unread,

Nov 3, 2013, 12:45:30 PM11/3/13

to clojure

I should perhaps make clear that with direct use of monitor-enter and
monitor-exit with a Var it's possible for monitor-enter and
monitor-exit to operate on different objects even in the absence of
typos, namely if somebody rebinds the Var. To illustrate this with
print at the REPL (a regular Clojure REPL, as opposed to nREPL or
similar, in case there are any issues with reproducing this in
different environments):

(def x 1)
(future (print x) (Thread/sleep 5000) (print x))
; prints 1 immediately, but then goes to sleep;
; in the meantime, I say
(def x 2)
@*2
; and after a moment 2 is printed

NB. this could also happen with alter-var-root or set! to a Var with a
thread-local binding.

Cheers,
Michał

Andy Fingerhut

unread,

Nov 3, 2013, 12:46:18 PM11/3/13

to clo...@googlegroups.com

Good detective work, Michal.

So the extra time for the slower version was because a Var was always being accessed in the generated byte code, even if the source code was only explicitly accessing the Var in a branch that was never executed?

Thanks,
Andy

Michael Blume

unread,

Nov 3, 2013, 2:09:43 PM11/3/13

to clo...@googlegroups.com

Huh, interesting.

I have:

(defn foo' [x]

(if (> x 0)

(inc x)

(let [res (locking o (dec x))] res)))

(defn foo'' [x]

(if (> x 0)

(inc x)

(locking o

(dec x))))

foo' is fast, but foo'' is slow. So something about wrapping the locking clause in a let makes it fast. Still no idea why.

Michael Blume

unread,

Nov 3, 2013, 2:46:57 PM11/3/13

to clo...@googlegroups.com

I mean, I'm probably being naive, but this suggests that one could write

(defmacro locking' [& forms]

`(let [res# (locking ~@forms)] res#))

and use locking' in place of locking for improved performance. Is this wrong? If it's right, does that suggest the macro in clojure.core should be changed?

You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/x86VygZYf4Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

Michał Marczyk

unread,

Nov 3, 2013, 3:10:36 PM11/3/13

to clojure

Well, that is interesting.

The difference between the compiled versions of

(defn foo [x]
(if (> x 0)
(inc x)
(locking o
(dec x))))

and

(defn bar [x]
(if (> x 0)
(inc x)

(let [res (locking o
(dec x))]
res)))

is quite significant. foo gets compiled to a single class, with
invocations handled by a single invoke method; bar gets compiled to a
class for bar + an extra class for an inner function which handles the
(locking o (dec x)) part -- probably very similar to the output for
the version with the hand-coded locking-part (although I haven't
really looked at that yet). The inner function is a closure, so
calling it involves an allocation of a closure object; its ctor
receives the closed-over locals as arguments and stores them in two
fields (lockee and x). Then they get loaded from the fields in the
body of the closure's invoke method etc.

I guess I'll have to play around with Java equivalents too...

Cheers,
Michał

Michael Blume

unread,

Nov 17, 2013, 1:14:15 PM11/17/13

to clo...@googlegroups.com

Sorry to be a bother, but any movement on this? This looks like a real performance bug and I don't yet have the internals knowledge to chase it down myself.

Andy Fingerhut

unread,

Nov 17, 2013, 5:11:42 PM11/17/13

to clo...@googlegroups.com

I have just created a ticket linking to this discussion, with a copy of one of Michal's earlier messages in the thread as a description of what might be the problem.

I would not say that this is the same as movement on this issue. Movement in the form that would lead to a change in the Clojure compiler will require one or more volunteers to carefully diagnose the root cause of the problem, develop one or more patches, respond to comments from screeners and/or Rich Hickey, etc. If people want to raise more attention to this issue, one way to do it is to vote on the ticket in JIRA. If you do not already have an account, you can create one here:

http://dev.clojure.org/jira/secure/Signup!default.jspa

Andy

Andy Fingerhut

unread,

Nov 17, 2013, 5:12:15 PM11/17/13

to clo...@googlegroups.com

Sorry, I neglected to include a link to the ticket:

http://dev.clojure.org/jira/browse/CLJ-1296

Reply all

Reply to author

Forward