Why is `(get-in m ks)` not implemented as `(get-in m ks nil)`?

Dirk Wetzel

unread,

Dec 11, 2019, 6:04:38 AM12/11/19

to Clojure

Hey everyone! :)

I was recently looking at the source for get-in to check if it would exit early as soon as a key was not present in the nested maps.

The source for quick reference:

(defn get-in
  "Returns the value in a nested associative structure,
  where ks is a sequence of keys. Returns nil if the key
  is not present, or the not-found value if supplied."
  {:added "1.2"
   :static true}
  ([m ks]
     (reduce1 get m ks))
  ([m ks not-found]
     (loop [sentinel (Object.)
            m m
            ks (seq ks)]
       (if ks
         (let [m (get m (first ks) sentinel)]
           (if (identical? sentinel m)
             not-found
             (recur sentinel m (next ks))))
         m))))

I noticed that the 2-arity variant does not exit early, but the 3-arity variant does. Also looking at the implementation of reduce1 vs the implementation of the 3-arity variant of get-in got me thinking if it wouldn't be slightly more efficient to just implement the 2-arity variant by calling the 3-arity variant, eg:

([m ks]
   (get-in m ks nil))

I've run a few micro benchmarks comparing speeds of the 2-arity and 3-arity variant using criterium and the 3-arity variant seems to be more efficient even if it can't exit early.

Below are a few of the criterium benchmarks I did on my laptop:

(crit/bench (get-in {} [:a :b]))

Evaluation count : 1874799960 in 60 samples of 31246666 calls.
             Execution time mean : 32.013948 ns
    Execution time std-deviation : 1.172814 ns
   Execution time lower quantile : 29.839525 ns ( 2.5%)
   Execution time upper quantile : 34.283877 ns (97.5%)
                   Overhead used : 2.189987 ns

Found 17 outliers in 60 samples (28.3333 %)
	low-severe	 13 (21.6667 %)
	low-mild	 4 (6.6667 %)
 Variance from outliers : 23.7873 % Variance is moderately inflated by outliers

(crit/bench (get-in {} [:a :b] nil))

Evaluation count : 4071018060 in 60 samples of 67850301 calls.
             Execution time mean : 15.124086 ns
    Execution time std-deviation : 1.492542 ns
   Execution time lower quantile : 12.488672 ns ( 2.5%)
   Execution time upper quantile : 17.908223 ns (97.5%)
                   Overhead used : 2.189987 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 68.6898 % Variance is severely inflated by outliers

(crit/bench (get-in {:a {:b {:c 1}}} [:a :b :c]))

Evaluation count : 1034704440 in 60 samples of 17245074 calls.
             Execution time mean : 58.600293 ns
    Execution time std-deviation : 2.939752 ns
   Execution time lower quantile : 55.782160 ns ( 2.5%)
   Execution time upper quantile : 64.772887 ns (97.5%)
                   Overhead used : 2.189987 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 36.8128 % Variance is moderately inflated by outliers

(crit/bench (get-in {:a {:b {:c 1}}} [:a :b :c] nil))

Evaluation count : 1359247260 in 60 samples of 22654121 calls.
             Execution time mean : 42.744780 ns
    Execution time std-deviation : 3.356838 ns
   Execution time lower quantile : 39.571655 ns ( 2.5%)
   Execution time upper quantile : 48.841751 ns (97.5%)
                   Overhead used : 2.189987 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 58.4799 % Variance is severely inflated by outliers

While I can imagine that changing the implementation of the 2-arity variant would barely bring noticeable improvements to projects, I also can't see much of a reason not to change it. It doesn't make the code less readable or maintainable. It just makes performance for both arities consistently good.

What would be pros and cons of such a change that I don't see at the moment?

Alex Miller

unread,

Dec 11, 2019, 3:54:35 PM12/11/19

to Clojure

At a skim, seems like a reasonable thing to do.

Matching Socks

unread,

Dec 12, 2019, 6:01:52 AM12/12/19

to Clojure

By the way, the ClojureScript version of get-in's 3-arity avoids allocating a sentinel. Instead, it uses a predefined sentinel:

(def ^:private lookup-sentinel (js-obj))

Dirk Wetzel

unread,

Dec 12, 2019, 6:14:12 AM12/12/19

to Clojure

I had actually included a suggestion for a predefined sentinel like this in an earlier draft of my post. I scrapped that bit though, because I could not measure any consistent performance benefit on my machine, aside from not allocating a new Object each time.

Reply all

Reply to author

Forward